How do I install Replicate Cog — Containerize ML Models with One YAML File?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Replicate Cog — Containerize ML Models with One YAML File

from cog import BasePredictor, Input, Path from PIL import Image import torch class Predictor(BasePredictor): def setup(self): """Load model into memory once at boot.""" self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True) self.model.eval() def predict( self, image: Path = Input(description="Image to classify"), top_k: int = Input(default=3, ge=1, le=10), ) -> dict: img = Image.open(image) # ... preprocess and run model ... return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}

# Build the image cog build -t resnet50 # Run locally with GPU cog predict -i image=@cat.jpg -i top_k=5 # Push to Replicate cog push r8.im/yourname/resnet50 # Or deploy elsewhere (Cog images are standard Docker) docker run -p 5000:5000 --gpus=all resnet50 curl http://localhost:5000/predictions \ -H 'Content-Type: application/json' \ -d '{"input": {"image": "...base64..."}}'

Quick Use

Install: brew install cog (macOS) or sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog
Create cog.yaml and predict.py (templates in this asset)
cog predict to test locally; cog push to ship to Replicate

Intro

Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a cog.yaml for environment, a predict.py for inference, and cog build produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes.

cog.yaml

build:
  gpu: true
  cuda: "12.1"
  python_version: "3.11"
  python_packages:
    - "torch==2.4.0"
    - "transformers==4.45.0"
    - "pillow==11.0.0"
predict: "predict.py:Predictor"

predict.py

from cog import BasePredictor, Input, Path
from PIL import Image
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load model into memory once at boot."""
        self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
        self.model.eval()

    def predict(
        self,
        image: Path = Input(description="Image to classify"),
        top_k: int = Input(default=3, ge=1, le=10),
    ) -> dict:
        img = Image.open(image)
        # ... preprocess and run model ...
        return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}

Build, run, deploy

# Build the image
cog build -t resnet50

# Run locally with GPU
cog predict -i image=@cat.jpg -i top_k=5

# Push to Replicate
cog push r8.im/yourname/resnet50

# Or deploy elsewhere (Cog images are standard Docker)
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
  -H 'Content-Type: application/json' \
  -d '{"input": {"image": "...base64..."}}'

What you get for free

Type-checked, schema-documented inputs (Cog generates OpenAPI)
Multi-GPU support via gpu: true and CUDA version pin
Auto-detects PyTorch / TensorFlow / JAX and pins versions
Output of Path types automatically uploaded to a CDN
Works as a standard Docker image anywhere

FAQ

Q: Is Cog free? A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free.

Q: Does Cog work on Apple Silicon? A: Yes — gpu: false produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box).

Q: How does this differ from a regular Dockerfile? A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler.

Source & Thanks

Built by Replicate. Licensed under Apache-2.0.

replicate/cog — ⭐ 9,000+

Replicate Cog — Containerize ML Models with One YAML File

Review-first install path

cog.yaml

predict.py

Build, run, deploy

What you get for free

FAQ

Quick Use

Intro

cog.yaml

predict.py

Build, run, deploy

What you get for free

FAQ

Source & Thanks

Source & Thanks

Discussion

Related Assets

Replicate — Run AI Models via Simple API Calls

Replicate Webhooks — Async Notifications for Slow Models

tiktoken — Fast BPE Tokenizer for OpenAI Models

CogVideo — Text and Image to Video Generation