Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 7, 2026·4 min de lecture

Replicate Cog — Containerize ML Models with One YAML File

Cog is Replicate's open-source tool to wrap an ML model in a Docker container. One cog.yaml + predict.py gives you a portable, GPU-aware HTTP model.

Replicate
Replicate · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Needs Confirmation · 52/100Policy : confirmer
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install 406d216d-018b-4242-8a26-a4a8df47bb4c
Introduction

Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a cog.yaml for environment, a predict.py for inference, and cog build produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes.


cog.yaml

build:
  gpu: true
  cuda: "12.1"
  python_version: "3.11"
  python_packages:
    - "torch==2.4.0"
    - "transformers==4.45.0"
    - "pillow==11.0.0"
predict: "predict.py:Predictor"

predict.py

from cog import BasePredictor, Input, Path
from PIL import Image
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load model into memory once at boot."""
        self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
        self.model.eval()

    def predict(
        self,
        image: Path = Input(description="Image to classify"),
        top_k: int = Input(default=3, ge=1, le=10),
    ) -> dict:
        img = Image.open(image)
        # ... preprocess and run model ...
        return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}

Build, run, deploy

# Build the image
cog build -t resnet50

# Run locally with GPU
cog predict -i image=@cat.jpg -i top_k=5

# Push to Replicate
cog push r8.im/yourname/resnet50

# Or deploy elsewhere (Cog images are standard Docker)
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
  -H 'Content-Type: application/json' \
  -d '{"input": {"image": "...base64..."}}'

What you get for free

  • Type-checked, schema-documented inputs (Cog generates OpenAPI)
  • Multi-GPU support via gpu: true and CUDA version pin
  • Auto-detects PyTorch / TensorFlow / JAX and pins versions
  • Output of Path types automatically uploaded to a CDN
  • Works as a standard Docker image anywhere

FAQ

Q: Is Cog free? A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free.

Q: Does Cog work on Apple Silicon? A: Yes — gpu: false produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box).

Q: How does this differ from a regular Dockerfile? A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler.


Quick Use

  1. Install: brew install cog (macOS) or sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog
  2. Create cog.yaml and predict.py (templates in this asset)
  3. cog predict to test locally; cog push to ship to Replicate

Intro

Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a cog.yaml for environment, a predict.py for inference, and cog build produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes.


cog.yaml

build:
  gpu: true
  cuda: "12.1"
  python_version: "3.11"
  python_packages:
    - "torch==2.4.0"
    - "transformers==4.45.0"
    - "pillow==11.0.0"
predict: "predict.py:Predictor"

predict.py

from cog import BasePredictor, Input, Path
from PIL import Image
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load model into memory once at boot."""
        self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
        self.model.eval()

    def predict(
        self,
        image: Path = Input(description="Image to classify"),
        top_k: int = Input(default=3, ge=1, le=10),
    ) -> dict:
        img = Image.open(image)
        # ... preprocess and run model ...
        return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}

Build, run, deploy

# Build the image
cog build -t resnet50

# Run locally with GPU
cog predict -i image=@cat.jpg -i top_k=5

# Push to Replicate
cog push r8.im/yourname/resnet50

# Or deploy elsewhere (Cog images are standard Docker)
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
  -H 'Content-Type: application/json' \
  -d '{"input": {"image": "...base64..."}}'

What you get for free

  • Type-checked, schema-documented inputs (Cog generates OpenAPI)
  • Multi-GPU support via gpu: true and CUDA version pin
  • Auto-detects PyTorch / TensorFlow / JAX and pins versions
  • Output of Path types automatically uploaded to a CDN
  • Works as a standard Docker image anywhere

FAQ

Q: Is Cog free? A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free.

Q: Does Cog work on Apple Silicon? A: Yes — gpu: false produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box).

Q: How does this differ from a regular Dockerfile? A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler.


Source & Thanks

Built by Replicate. Licensed under Apache-2.0.

replicate/cog — ⭐ 9,000+

🙏

Source et remerciements

Built by Replicate. Licensed under Apache-2.0.

replicate/cog — ⭐ 9,000+

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires