# Replicate — Run AI Models via Simple API Calls

> Cloud platform to run open-source AI models with a simple API. Replicate hosts Llama, Stable Diffusion, Whisper, and thousands of models — no GPU setup or Docker required.

## Install

Save as a script file and run:

## Quick Use

```bash
pip install replicate
```

```python
import replicate

# Run Llama 3.1
output = replicate.run(
    "meta/meta-llama-3.1-405b-instruct",
    input={"prompt": "Explain quantum computing in simple terms"},
)
print("".join(output))

# Generate an image with SDXL
output = replicate.run(
    "stability-ai/sdxl:latest",
    input={"prompt": "A sunset over mountains, oil painting style"},
)
print(output[0])  # Image URL
```

## What is Replicate?

Replicate is a cloud platform that runs open-source AI models via a simple API. No GPU provisioning, no Docker, no model serving code — just call `replicate.run()` with a model name and input. It hosts thousands of models including Llama, Stable Diffusion, Whisper, and community fine-tunes. Pay only for compute time used.

**Answer-Ready**: Replicate runs open-source AI models via simple API. No GPU setup needed. Hosts Llama, Stable Diffusion, Whisper, and 10,000+ models. Pay-per-second billing. Push custom models with Cog. Python, Node.js, and HTTP API. Used by thousands of AI startups.

**Best for**: Developers wanting to use open-source models without managing GPUs. **Works with**: Any language via HTTP API, Python SDK, Node.js SDK. **Setup time**: Under 2 minutes.

## Core Features

### 1. Run Any Model

```python
# Text generation
replicate.run("meta/meta-llama-3.1-70b-instruct", input={"prompt": "..."})

# Image generation
replicate.run("stability-ai/sdxl", input={"prompt": "..."})

# Speech-to-text
replicate.run("openai/whisper", input={"audio": open("speech.mp3", "rb")})

# Image upscaling
replicate.run("nightmareai/real-esrgan", input={"image": "https://..."})

# Video generation
replicate.run("anotherjesse/zeroscope-v2-xl", input={"prompt": "..."})
```

### 2. Streaming

```python
for event in replicate.stream(
    "meta/meta-llama-3.1-70b-instruct",
    input={"prompt": "Write a story about AI"},
):
    print(str(event), end="")
```

### 3. Custom Models (Cog)

```bash
# Package your own model
pip install cog
cog init
# Edit predict.py with your model code
cog push r8.im/username/my-model
```

### 4. Webhooks

```python
prediction = replicate.predictions.create(
    model="stability-ai/sdxl",
    input={"prompt": "..."},
    webhook="https://your-app.com/webhook",
    webhook_events_filter=["completed"],
)
```

## Popular Models

| Category | Model | Use Case |
|----------|-------|----------|
| Text | Llama 3.1 405B | Best open-source chat |
| Image | SDXL | Text-to-image |
| Image | FLUX | Latest image gen |
| Audio | Whisper | Speech-to-text |
| Video | Stable Video | Image-to-video |
| Code | CodeLlama | Code generation |
| Upscale | Real-ESRGAN | Image upscaling |

## Pricing

| GPU | Price | Best For |
|-----|-------|----------|
| CPU | $0.000100/sec | Light tasks |
| Nvidia T4 | $0.000225/sec | Inference |
| Nvidia A40 | $0.000575/sec | Medium models |
| Nvidia A100 | $0.001150/sec | Large models |

No minimum. Pay per second of compute.

## Replicate vs Alternatives

| Feature | Replicate | HuggingFace | Together AI | Modal |
|---------|-----------|-------------|-------------|-------|
| Model hosting | 10,000+ | 500K+ models | 200+ | Custom |
| Custom models | Yes (Cog) | Yes (Spaces) | Limited | Yes |
| Pricing | Per second | Per second | Per token | Per second |
| GPU management | Zero | Zero | Zero | Semi |
| API simplicity | Very simple | Moderate | Simple | Code-based |

## FAQ

**Q: How fast is cold start?**
A: First run may take 10-30 seconds for model loading. Subsequent runs are faster. Use "always-on" for instant starts.

**Q: Can I fine-tune on Replicate?**
A: Yes, several models support fine-tuning directly on Replicate (SDXL, Llama, etc.).

**Q: Is it expensive for production?**
A: Cost-effective for variable workloads (pay per second). For sustained high volume, dedicated GPUs may be cheaper.

## Source & Thanks

> Created by [Replicate](https://replicate.com).
>
> [replicate.com](https://replicate.com) — Run AI models in the cloud

<!-- ZH -->

## 快速使用

```bash
pip install replicate
```

一行代码运行开源 AI 模型，无需 GPU。

## 什么是 Replicate？

云平台，通过简单 API 运行开源 AI 模型。无 GPU 管理，10000+ 模型，按秒计费。

**一句话总结**：云端运行开源 AI 模型的平台，Llama/SDXL/Whisper 等 10000+ 模型，API 一行调用，按秒计费，支持自定义模型（Cog）。

**适合人群**：不想管 GPU 想用开源模型的开发者。

## 核心功能

### 1. 一行运行 — replicate.run("model", input={...})
### 2. 10000+ 模型 — 文本/图像/音频/视频
### 3. 自定义模型 — Cog 打包推送
### 4. 按秒计费 — 无最低消费

## 常见问题

**Q: 冷启动多久？**
A: 首次 10-30 秒，后续更快。可用 always-on 即时启动。

## 来源与致谢

> [replicate.com](https://replicate.com) — 云端 AI 模型平台

---
Source: https://tokrepo.com/en/workflows/e80aca76-b9b8-4330-8611-ee1ead26c99e
Author: AI Open Source