Quick Use
- Install:
brew install cog(macOS) orsudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog - Create
cog.yamlandpredict.py(templates in this asset) cog predictto test locally;cog pushto ship to Replicate
Intro
Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a cog.yaml for environment, a predict.py for inference, and cog build produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes.
cog.yaml
build:
gpu: true
cuda: "12.1"
python_version: "3.11"
python_packages:
- "torch==2.4.0"
- "transformers==4.45.0"
- "pillow==11.0.0"
predict: "predict.py:Predictor"predict.py
from cog import BasePredictor, Input, Path
from PIL import Image
import torch
class Predictor(BasePredictor):
def setup(self):
"""Load model into memory once at boot."""
self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
self.model.eval()
def predict(
self,
image: Path = Input(description="Image to classify"),
top_k: int = Input(default=3, ge=1, le=10),
) -> dict:
img = Image.open(image)
# ... preprocess and run model ...
return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}Build, run, deploy
# Build the image
cog build -t resnet50
# Run locally with GPU
cog predict -i image=@cat.jpg -i top_k=5
# Push to Replicate
cog push r8.im/yourname/resnet50
# Or deploy elsewhere (Cog images are standard Docker)
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
-H 'Content-Type: application/json' \
-d '{"input": {"image": "...base64..."}}'What you get for free
- Type-checked, schema-documented inputs (Cog generates OpenAPI)
- Multi-GPU support via
gpu: trueand CUDA version pin - Auto-detects PyTorch / TensorFlow / JAX and pins versions
- Output of
Pathtypes automatically uploaded to a CDN - Works as a standard Docker image anywhere
FAQ
Q: Is Cog free? A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free.
Q: Does Cog work on Apple Silicon?
A: Yes — gpu: false produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box).
Q: How does this differ from a regular Dockerfile? A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler.
Source & Thanks
Built by Replicate. Licensed under Apache-2.0.
replicate/cog — ⭐ 9,000+