SkillsMay 7, 2026·4 min read

Replicate Cog — Containerize ML Models with One YAML File

Cog is Replicate's open-source tool to wrap an ML model in a Docker container. One cog.yaml + predict.py gives you a portable, GPU-aware HTTP model.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Needs Confirmation · 52/100Policy: confirm
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install 406d216d-018b-4242-8a26-a4a8df47bb4c
Intro

Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a cog.yaml for environment, a predict.py for inference, and cog build produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes.


cog.yaml

build:
  gpu: true
  cuda: "12.1"
  python_version: "3.11"
  python_packages:
    - "torch==2.4.0"
    - "transformers==4.45.0"
    - "pillow==11.0.0"
predict: "predict.py:Predictor"

predict.py

from cog import BasePredictor, Input, Path
from PIL import Image
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load model into memory once at boot."""
        self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
        self.model.eval()

    def predict(
        self,
        image: Path = Input(description="Image to classify"),
        top_k: int = Input(default=3, ge=1, le=10),
    ) -> dict:
        img = Image.open(image)
        # ... preprocess and run model ...
        return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}

Build, run, deploy

# Build the image
cog build -t resnet50

# Run locally with GPU
cog predict -i image=@cat.jpg -i top_k=5

# Push to Replicate
cog push r8.im/yourname/resnet50

# Or deploy elsewhere (Cog images are standard Docker)
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
  -H 'Content-Type: application/json' \
  -d '{"input": {"image": "...base64..."}}'

What you get for free

  • Type-checked, schema-documented inputs (Cog generates OpenAPI)
  • Multi-GPU support via gpu: true and CUDA version pin
  • Auto-detects PyTorch / TensorFlow / JAX and pins versions
  • Output of Path types automatically uploaded to a CDN
  • Works as a standard Docker image anywhere

FAQ

Q: Is Cog free? A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free.

Q: Does Cog work on Apple Silicon? A: Yes — gpu: false produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box).

Q: How does this differ from a regular Dockerfile? A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler.


Quick Use

  1. Install: brew install cog (macOS) or sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog
  2. Create cog.yaml and predict.py (templates in this asset)
  3. cog predict to test locally; cog push to ship to Replicate

Intro

Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a cog.yaml for environment, a predict.py for inference, and cog build produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes.


cog.yaml

build:
  gpu: true
  cuda: "12.1"
  python_version: "3.11"
  python_packages:
    - "torch==2.4.0"
    - "transformers==4.45.0"
    - "pillow==11.0.0"
predict: "predict.py:Predictor"

predict.py

from cog import BasePredictor, Input, Path
from PIL import Image
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load model into memory once at boot."""
        self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
        self.model.eval()

    def predict(
        self,
        image: Path = Input(description="Image to classify"),
        top_k: int = Input(default=3, ge=1, le=10),
    ) -> dict:
        img = Image.open(image)
        # ... preprocess and run model ...
        return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}

Build, run, deploy

# Build the image
cog build -t resnet50

# Run locally with GPU
cog predict -i image=@cat.jpg -i top_k=5

# Push to Replicate
cog push r8.im/yourname/resnet50

# Or deploy elsewhere (Cog images are standard Docker)
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
  -H 'Content-Type: application/json' \
  -d '{"input": {"image": "...base64..."}}'

What you get for free

  • Type-checked, schema-documented inputs (Cog generates OpenAPI)
  • Multi-GPU support via gpu: true and CUDA version pin
  • Auto-detects PyTorch / TensorFlow / JAX and pins versions
  • Output of Path types automatically uploaded to a CDN
  • Works as a standard Docker image anywhere

FAQ

Q: Is Cog free? A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free.

Q: Does Cog work on Apple Silicon? A: Yes — gpu: false produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box).

Q: How does this differ from a regular Dockerfile? A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler.


Source & Thanks

Built by Replicate. Licensed under Apache-2.0.

replicate/cog — ⭐ 9,000+

🙏

Source & Thanks

Built by Replicate. Licensed under Apache-2.0.

replicate/cog — ⭐ 9,000+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets