# Replicate — Run AI Models via Simple API Calls > Cloud platform to run open-source AI models with a simple API. Replicate hosts Llama, Stable Diffusion, Whisper, and thousands of models — no GPU setup or Docker required. ## Install Save as a script file and run: ## Quick Use ```bash pip install replicate ``` ```python import replicate # Run Llama 3.1 output = replicate.run( "meta/meta-llama-3.1-405b-instruct", input={"prompt": "Explain quantum computing in simple terms"}, ) print("".join(output)) # Generate an image with SDXL output = replicate.run( "stability-ai/sdxl:latest", input={"prompt": "A sunset over mountains, oil painting style"}, ) print(output[0]) # Image URL ``` ## What is Replicate? Replicate is a cloud platform that runs open-source AI models via a simple API. No GPU provisioning, no Docker, no model serving code — just call `replicate.run()` with a model name and input. It hosts thousands of models including Llama, Stable Diffusion, Whisper, and community fine-tunes. Pay only for compute time used. **Answer-Ready**: Replicate runs open-source AI models via simple API. No GPU setup needed. Hosts Llama, Stable Diffusion, Whisper, and 10,000+ models. Pay-per-second billing. Push custom models with Cog. Python, Node.js, and HTTP API. Used by thousands of AI startups. **Best for**: Developers wanting to use open-source models without managing GPUs. **Works with**: Any language via HTTP API, Python SDK, Node.js SDK. **Setup time**: Under 2 minutes. ## Core Features ### 1. Run Any Model ```python # Text generation replicate.run("meta/meta-llama-3.1-70b-instruct", input={"prompt": "..."}) # Image generation replicate.run("stability-ai/sdxl", input={"prompt": "..."}) # Speech-to-text replicate.run("openai/whisper", input={"audio": open("speech.mp3", "rb")}) # Image upscaling replicate.run("nightmareai/real-esrgan", input={"image": "https://..."}) # Video generation replicate.run("anotherjesse/zeroscope-v2-xl", input={"prompt": "..."}) ``` ### 2. Streaming ```python for event in replicate.stream( "meta/meta-llama-3.1-70b-instruct", input={"prompt": "Write a story about AI"}, ): print(str(event), end="") ``` ### 3. Custom Models (Cog) ```bash # Package your own model pip install cog cog init # Edit predict.py with your model code cog push r8.im/username/my-model ``` ### 4. Webhooks ```python prediction = replicate.predictions.create( model="stability-ai/sdxl", input={"prompt": "..."}, webhook="https://your-app.com/webhook", webhook_events_filter=["completed"], ) ``` ## Popular Models | Category | Model | Use Case | |----------|-------|----------| | Text | Llama 3.1 405B | Best open-source chat | | Image | SDXL | Text-to-image | | Image | FLUX | Latest image gen | | Audio | Whisper | Speech-to-text | | Video | Stable Video | Image-to-video | | Code | CodeLlama | Code generation | | Upscale | Real-ESRGAN | Image upscaling | ## Pricing | GPU | Price | Best For | |-----|-------|----------| | CPU | $0.000100/sec | Light tasks | | Nvidia T4 | $0.000225/sec | Inference | | Nvidia A40 | $0.000575/sec | Medium models | | Nvidia A100 | $0.001150/sec | Large models | No minimum. Pay per second of compute. ## Replicate vs Alternatives | Feature | Replicate | HuggingFace | Together AI | Modal | |---------|-----------|-------------|-------------|-------| | Model hosting | 10,000+ | 500K+ models | 200+ | Custom | | Custom models | Yes (Cog) | Yes (Spaces) | Limited | Yes | | Pricing | Per second | Per second | Per token | Per second | | GPU management | Zero | Zero | Zero | Semi | | API simplicity | Very simple | Moderate | Simple | Code-based | ## FAQ **Q: How fast is cold start?** A: First run may take 10-30 seconds for model loading. Subsequent runs are faster. Use "always-on" for instant starts. **Q: Can I fine-tune on Replicate?** A: Yes, several models support fine-tuning directly on Replicate (SDXL, Llama, etc.). **Q: Is it expensive for production?** A: Cost-effective for variable workloads (pay per second). For sustained high volume, dedicated GPUs may be cheaper. ## Source & Thanks > Created by [Replicate](https://replicate.com). > > [replicate.com](https://replicate.com) — Run AI models in the cloud ## 快速使用 ```bash pip install replicate ``` 一行代码运行开源 AI 模型,无需 GPU。 ## 什么是 Replicate? 云平台,通过简单 API 运行开源 AI 模型。无 GPU 管理,10000+ 模型,按秒计费。 **一句话总结**:云端运行开源 AI 模型的平台,Llama/SDXL/Whisper 等 10000+ 模型,API 一行调用,按秒计费,支持自定义模型(Cog)。 **适合人群**:不想管 GPU 想用开源模型的开发者。 ## 核心功能 ### 1. 一行运行 — replicate.run("model", input={...}) ### 2. 10000+ 模型 — 文本/图像/音频/视频 ### 3. 自定义模型 — Cog 打包推送 ### 4. 按秒计费 — 无最低消费 ## 常见问题 **Q: 冷启动多久?** A: 首次 10-30 秒,后续更快。可用 always-on 即时启动。 ## 来源与致谢 > [replicate.com](https://replicate.com) — 云端 AI 模型平台 --- Source: https://tokrepo.com/en/workflows/e80aca76-b9b8-4330-8611-ee1ead26c99e Author: AI Open Source