# Replicate Cog — Containerize ML Models with One YAML File

> Cog is Replicate's open-source tool to wrap an ML model in a Docker container. One cog.yaml + predict.py gives you a portable, GPU-aware HTTP model.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

1. Install: `brew install cog` (macOS) or `sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog`
2. Create `cog.yaml` and `predict.py` (templates in this asset)
3. `cog predict` to test locally; `cog push` to ship to Replicate

---

## Intro

Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a `cog.yaml` for environment, a `predict.py` for inference, and `cog build` produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes.

---

### cog.yaml

```yaml
build:
  gpu: true
  cuda: "12.1"
  python_version: "3.11"
  python_packages:
    - "torch==2.4.0"
    - "transformers==4.45.0"
    - "pillow==11.0.0"
predict: "predict.py:Predictor"
```

### predict.py

```python
from cog import BasePredictor, Input, Path
from PIL import Image
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load model into memory once at boot."""
        self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
        self.model.eval()

    def predict(
        self,
        image: Path = Input(description="Image to classify"),
        top_k: int = Input(default=3, ge=1, le=10),
    ) -> dict:
        img = Image.open(image)
        # ... preprocess and run model ...
        return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}
```

### Build, run, deploy

```bash
# Build the image
cog build -t resnet50

# Run locally with GPU
cog predict -i image=@cat.jpg -i top_k=5

# Push to Replicate
cog push r8.im/yourname/resnet50

# Or deploy elsewhere (Cog images are standard Docker)
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
  -H 'Content-Type: application/json' \
  -d '{"input": {"image": "...base64..."}}'
```

### What you get for free

- Type-checked, schema-documented inputs (Cog generates OpenAPI)
- Multi-GPU support via `gpu: true` and CUDA version pin
- Auto-detects PyTorch / TensorFlow / JAX and pins versions
- Output of `Path` types automatically uploaded to a CDN
- Works as a standard Docker image anywhere

---

### FAQ

**Q: Is Cog free?**
A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free.

**Q: Does Cog work on Apple Silicon?**
A: Yes — `gpu: false` produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box).

**Q: How does this differ from a regular Dockerfile?**
A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler.

---

## Source & Thanks

> Built by [Replicate](https://github.com/replicate). Licensed under Apache-2.0.
>
> [replicate/cog](https://github.com/replicate/cog) — ⭐ 9,000+

---

<!-- ZH -->

## 快速使用

1. 装：macOS 用 `brew install cog`，或 `sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog`
2. 建 `cog.yaml` 和 `predict.py`（模板见本资产）
3. `cog predict` 本地测试；`cog push` 推到 Replicate

---

## 简介

Cog 是 Replicate 的开源工具，把 ML 模型包成带干净 HTTP API 的 Docker 容器。`cog.yaml` 定义环境、`predict.py` 定义推理，`cog build` 出一个可移植镜像 —— 本地跑、Replicate 上跑、K8s 上跑、你自己 GPU 机器上跑都行。适合不想写 Dockerfile 又想发布可复现模型的 ML 研究员 / 工程师。兼容 Linux / macOS / Windows（WSL2）。装机时间 10 分钟。

---

### cog.yaml

```yaml
build:
  gpu: true
  cuda: "12.1"
  python_version: "3.11"
  python_packages:
    - "torch==2.4.0"
    - "transformers==4.45.0"
    - "pillow==11.0.0"
predict: "predict.py:Predictor"
```

### predict.py

```python
from cog import BasePredictor, Input, Path
from PIL import Image
import torch

class Predictor(BasePredictor):
    def setup(self):
        """启动时加载模型到内存一次。"""
        self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
        self.model.eval()

    def predict(
        self,
        image: Path = Input(description="Image to classify"),
        top_k: int = Input(default=3, ge=1, le=10),
    ) -> dict:
        img = Image.open(image)
        # …预处理 + 跑模型…
        return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]}
```

### 构建、运行、部署

```bash
# 构建镜像
cog build -t resnet50

# 本地用 GPU 跑
cog predict -i image=@cat.jpg -i top_k=5

# 推到 Replicate
cog push r8.im/yourname/resnet50

# 或部署别处（Cog 镜像就是标准 Docker）
docker run -p 5000:5000 --gpus=all resnet50
curl http://localhost:5000/predictions \
  -H 'Content-Type: application/json' \
  -d '{"input": {"image": "...base64..."}}'
```

### 免费送你的

- 类型检查、schema 化的输入（Cog 生成 OpenAPI）
- `gpu: true` 加 CUDA 版本钉的多 GPU 支持
- 自动识别 PyTorch / TensorFlow / JAX 并固定版本
- `Path` 类型的输出自动上传 CDN
- 作为标准 Docker 镜像在任何地方都能跑

---

### FAQ

**Q: Cog 免费吗？**
A: 免费 —— Cog Apache-2.0 开源。Replicate 托管付费（按 GPU 秒计费），但 Cog 镜像在任何能跑 Docker 的地方部署都免费。

**Q: Cog 在 Apple Silicon 上能用吗？**
A: 能 —— `gpu: false` 出 CPU-only 镜像在 Apple Silicon 上跑。要 GPU 推理就部署到别处（Replicate / Lambda / 自己 GPU 机器）。

**Q: 跟普通 Dockerfile 啥区别？**
A: Cog 给你生成 Dockerfile —— 钉 CUDA / PyTorch / 系统库 + 缓存。免费得到强类型输入、OpenAPI 文档、CDN 上传输出。非 ML 工作负载用普通 Docker 更简单。

---

## 来源与感谢

> Built by [Replicate](https://github.com/replicate). Licensed under Apache-2.0.
>
> [replicate/cog](https://github.com/replicate/cog) — ⭐ 9,000+


---
Source: https://tokrepo.com/en/workflows/replicate-cog-containerize-ml-models-with-one-yaml-file
Author: Replicate