ScriptsApr 8, 2026·3 min read

Replicate — Run AI Models via Simple API Calls

Cloud platform to run open-source AI models with a simple API. Replicate hosts Llama, Stable Diffusion, Whisper, and thousands of models — no GPU setup or Docker required.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install replicate
import replicate

# Run Llama 3.1
output = replicate.run(
    "meta/meta-llama-3.1-405b-instruct",
    input={"prompt": "Explain quantum computing in simple terms"},
)
print("".join(output))

# Generate an image with SDXL
output = replicate.run(
    "stability-ai/sdxl:latest",
    input={"prompt": "A sunset over mountains, oil painting style"},
)
print(output[0])  # Image URL

What is Replicate?

Replicate is a cloud platform that runs open-source AI models via a simple API. No GPU provisioning, no Docker, no model serving code — just call replicate.run() with a model name and input. It hosts thousands of models including Llama, Stable Diffusion, Whisper, and community fine-tunes. Pay only for compute time used.

Answer-Ready: Replicate runs open-source AI models via simple API. No GPU setup needed. Hosts Llama, Stable Diffusion, Whisper, and 10,000+ models. Pay-per-second billing. Push custom models with Cog. Python, Node.js, and HTTP API. Used by thousands of AI startups.

Best for: Developers wanting to use open-source models without managing GPUs. Works with: Any language via HTTP API, Python SDK, Node.js SDK. Setup time: Under 2 minutes.

Core Features

1. Run Any Model

# Text generation
replicate.run("meta/meta-llama-3.1-70b-instruct", input={"prompt": "..."})

# Image generation
replicate.run("stability-ai/sdxl", input={"prompt": "..."})

# Speech-to-text
replicate.run("openai/whisper", input={"audio": open("speech.mp3", "rb")})

# Image upscaling
replicate.run("nightmareai/real-esrgan", input={"image": "https://..."})

# Video generation
replicate.run("anotherjesse/zeroscope-v2-xl", input={"prompt": "..."})

2. Streaming

for event in replicate.stream(
    "meta/meta-llama-3.1-70b-instruct",
    input={"prompt": "Write a story about AI"},
):
    print(str(event), end="")

3. Custom Models (Cog)

# Package your own model
pip install cog
cog init
# Edit predict.py with your model code
cog push r8.im/username/my-model

4. Webhooks

prediction = replicate.predictions.create(
    model="stability-ai/sdxl",
    input={"prompt": "..."},
    webhook="https://your-app.com/webhook",
    webhook_events_filter=["completed"],
)

Popular Models

Category Model Use Case
Text Llama 3.1 405B Best open-source chat
Image SDXL Text-to-image
Image FLUX Latest image gen
Audio Whisper Speech-to-text
Video Stable Video Image-to-video
Code CodeLlama Code generation
Upscale Real-ESRGAN Image upscaling

Pricing

GPU Price Best For
CPU $0.000100/sec Light tasks
Nvidia T4 $0.000225/sec Inference
Nvidia A40 $0.000575/sec Medium models
Nvidia A100 $0.001150/sec Large models

No minimum. Pay per second of compute.

Replicate vs Alternatives

Feature Replicate HuggingFace Together AI Modal
Model hosting 10,000+ 500K+ models 200+ Custom
Custom models Yes (Cog) Yes (Spaces) Limited Yes
Pricing Per second Per second Per token Per second
GPU management Zero Zero Zero Semi
API simplicity Very simple Moderate Simple Code-based

FAQ

Q: How fast is cold start? A: First run may take 10-30 seconds for model loading. Subsequent runs are faster. Use "always-on" for instant starts.

Q: Can I fine-tune on Replicate? A: Yes, several models support fine-tuning directly on Replicate (SDXL, Llama, etc.).

Q: Is it expensive for production? A: Cost-effective for variable workloads (pay per second). For sustained high volume, dedicated GPUs may be cheaper.

🙏

Source & Thanks

Created by Replicate.

replicate.com — Run AI models in the cloud

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets