# Replicate Cog — Containerize ML Models with One YAML File > Cog is Replicate's open-source tool to wrap an ML model in a Docker container. One cog.yaml + predict.py gives you a portable, GPU-aware HTTP model. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use 1. Install: `brew install cog` (macOS) or `sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog` 2. Create `cog.yaml` and `predict.py` (templates in this asset) 3. `cog predict` to test locally; `cog push` to ship to Replicate --- ## Intro Cog is Replicate's open-source tool that wraps an ML model in a Docker container with a clean HTTP API. Define a `cog.yaml` for environment, a `predict.py` for inference, and `cog build` produces a portable image you can run anywhere — locally, on Replicate, on Kubernetes, on your own GPU box. Best for: ML researchers / engineers who want to ship a reproducible model without writing Dockerfiles. Works with: Linux, macOS, Windows (WSL2). Setup time: 10 minutes. --- ### cog.yaml ```yaml build: gpu: true cuda: "12.1" python_version: "3.11" python_packages: - "torch==2.4.0" - "transformers==4.45.0" - "pillow==11.0.0" predict: "predict.py:Predictor" ``` ### predict.py ```python from cog import BasePredictor, Input, Path from PIL import Image import torch class Predictor(BasePredictor): def setup(self): """Load model into memory once at boot.""" self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True) self.model.eval() def predict( self, image: Path = Input(description="Image to classify"), top_k: int = Input(default=3, ge=1, le=10), ) -> dict: img = Image.open(image) # ... preprocess and run model ... return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]} ``` ### Build, run, deploy ```bash # Build the image cog build -t resnet50 # Run locally with GPU cog predict -i image=@cat.jpg -i top_k=5 # Push to Replicate cog push r8.im/yourname/resnet50 # Or deploy elsewhere (Cog images are standard Docker) docker run -p 5000:5000 --gpus=all resnet50 curl http://localhost:5000/predictions \ -H 'Content-Type: application/json' \ -d '{"input": {"image": "...base64..."}}' ``` ### What you get for free - Type-checked, schema-documented inputs (Cog generates OpenAPI) - Multi-GPU support via `gpu: true` and CUDA version pin - Auto-detects PyTorch / TensorFlow / JAX and pins versions - Output of `Path` types automatically uploaded to a CDN - Works as a standard Docker image anywhere --- ### FAQ **Q: Is Cog free?** A: Yes — Cog is open-source under Apache-2.0. Replicate's hosting is paid (per-second GPU billing), but you can deploy Cog images anywhere Docker runs for free. **Q: Does Cog work on Apple Silicon?** A: Yes — `gpu: false` produces CPU-only images that run on Apple Silicon. For GPU inference on Mac, you'll need to deploy elsewhere (Replicate, Lambda, your own GPU box). **Q: How does this differ from a regular Dockerfile?** A: Cog generates the Dockerfile for you — pinning CUDA, PyTorch, system libraries, with caching. You get strongly-typed inputs, OpenAPI docs, and CDN-uploaded outputs without writing them. For non-ML workloads, regular Docker is simpler. --- ## Source & Thanks > Built by [Replicate](https://github.com/replicate). Licensed under Apache-2.0. > > [replicate/cog](https://github.com/replicate/cog) — ⭐ 9,000+ --- ## 快速使用 1. 装:macOS 用 `brew install cog`,或 `sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) && sudo chmod +x /usr/local/bin/cog` 2. 建 `cog.yaml` 和 `predict.py`(模板见本资产) 3. `cog predict` 本地测试;`cog push` 推到 Replicate --- ## 简介 Cog 是 Replicate 的开源工具,把 ML 模型包成带干净 HTTP API 的 Docker 容器。`cog.yaml` 定义环境、`predict.py` 定义推理,`cog build` 出一个可移植镜像 —— 本地跑、Replicate 上跑、K8s 上跑、你自己 GPU 机器上跑都行。适合不想写 Dockerfile 又想发布可复现模型的 ML 研究员 / 工程师。兼容 Linux / macOS / Windows(WSL2)。装机时间 10 分钟。 --- ### cog.yaml ```yaml build: gpu: true cuda: "12.1" python_version: "3.11" python_packages: - "torch==2.4.0" - "transformers==4.45.0" - "pillow==11.0.0" predict: "predict.py:Predictor" ``` ### predict.py ```python from cog import BasePredictor, Input, Path from PIL import Image import torch class Predictor(BasePredictor): def setup(self): """启动时加载模型到内存一次。""" self.model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True) self.model.eval() def predict( self, image: Path = Input(description="Image to classify"), top_k: int = Input(default=3, ge=1, le=10), ) -> dict: img = Image.open(image) # …预处理 + 跑模型… return {"top_classes": ["cat", "tabby", "egyptian"][:top_k]} ``` ### 构建、运行、部署 ```bash # 构建镜像 cog build -t resnet50 # 本地用 GPU 跑 cog predict -i image=@cat.jpg -i top_k=5 # 推到 Replicate cog push r8.im/yourname/resnet50 # 或部署别处(Cog 镜像就是标准 Docker) docker run -p 5000:5000 --gpus=all resnet50 curl http://localhost:5000/predictions \ -H 'Content-Type: application/json' \ -d '{"input": {"image": "...base64..."}}' ``` ### 免费送你的 - 类型检查、schema 化的输入(Cog 生成 OpenAPI) - `gpu: true` 加 CUDA 版本钉的多 GPU 支持 - 自动识别 PyTorch / TensorFlow / JAX 并固定版本 - `Path` 类型的输出自动上传 CDN - 作为标准 Docker 镜像在任何地方都能跑 --- ### FAQ **Q: Cog 免费吗?** A: 免费 —— Cog Apache-2.0 开源。Replicate 托管付费(按 GPU 秒计费),但 Cog 镜像在任何能跑 Docker 的地方部署都免费。 **Q: Cog 在 Apple Silicon 上能用吗?** A: 能 —— `gpu: false` 出 CPU-only 镜像在 Apple Silicon 上跑。要 GPU 推理就部署到别处(Replicate / Lambda / 自己 GPU 机器)。 **Q: 跟普通 Dockerfile 啥区别?** A: Cog 给你生成 Dockerfile —— 钉 CUDA / PyTorch / 系统库 + 缓存。免费得到强类型输入、OpenAPI 文档、CDN 上传输出。非 ML 工作负载用普通 Docker 更简单。 --- ## 来源与感谢 > Built by [Replicate](https://github.com/replicate). Licensed under Apache-2.0. > > [replicate/cog](https://github.com/replicate/cog) — ⭐ 9,000+ --- Source: https://tokrepo.com/en/workflows/replicate-cog-containerize-ml-models-with-one-yaml-file Author: Replicate