# Modal — Serverless GPU Cloud for AI Workloads > Run GPU workloads in the cloud with Python decorators. Modal provides serverless A100/H100 GPUs for model inference, fine-tuning, and batch jobs with zero infrastructure. ## Install Save as a script file and run: ## Quick Use ```bash pip install modal modal setup # One-time auth ``` ```python import modal app = modal.App("my-ai-app") @app.function(gpu="A100") def run_inference(prompt: str) -> str: from transformers import pipeline pipe = pipeline("text-generation", model="meta-llama/Llama-3.1-8B-Instruct", device="cuda") return pipe(prompt, max_new_tokens=256)[0]["generated_text"] @app.local_entrypoint() def main(): result = run_inference.remote("Explain quantum computing") print(result) ``` ```bash modal run my_app.py ``` ## What is Modal? Modal is a serverless GPU cloud platform where you define cloud functions with Python decorators. Add `@app.function(gpu="A100")` to any function and it runs on cloud GPUs — no Docker, no Kubernetes, no SSH. Modal handles container building, GPU provisioning, scaling, and shutdown automatically. Pay per second of compute. **Answer-Ready**: Modal is a serverless GPU cloud for AI. Python decorators turn local functions into cloud GPU jobs. A100/H100 GPUs, auto-scaling, per-second billing. No Docker or K8s needed. Used for inference, fine-tuning, and batch processing. The simplest path from laptop to cloud GPU. **Best for**: ML engineers needing cloud GPUs without infrastructure hassle. **Works with**: Any Python ML library, PyTorch, HuggingFace, vLLM. **Setup time**: Under 3 minutes. ## Core Features ### 1. GPU Selection ```python @app.function(gpu="T4") # Budget inference @app.function(gpu="A10G") # Mid-range @app.function(gpu="A100") # Standard training/inference @app.function(gpu="H100") # Maximum performance @app.function(gpu="A100:4") # Multi-GPU ``` ### 2. Container Definition (No Dockerfile) ```python image = ( modal.Image.debian_slim() .pip_install("torch", "transformers", "accelerate") .run_commands("apt-get install -y ffmpeg") ) @app.function(image=image, gpu="A100") def train(): ... ``` ### 3. Web Endpoints ```python @app.function(gpu="A100") @modal.web_endpoint() def generate(prompt: str): return {"text": run_model(prompt)} # Deployed at: https://your-app--generate.modal.run ``` ### 4. Scheduled Jobs ```python @app.function(schedule=modal.Cron("0 */6 * * *")) def batch_process(): # Runs every 6 hours ... ``` ### 5. Volumes (Persistent Storage) ```python vol = modal.Volume.from_name("model-cache", create_if_missing=True) @app.function(volumes={"/models": vol}, gpu="A100") def inference(): # Models cached across runs model = load_model("/models/llama-3.1") ``` ## Pricing | GPU | Price/hour | Best For | |-----|-----------|----------| | T4 | $0.59 | Light inference | | A10G | $1.10 | Medium workloads | | A100 40GB | $3.72 | Training/inference | | A100 80GB | $4.58 | Large models | | H100 | $6.98 | Maximum speed | Per-second billing. No minimum. ## Modal vs Alternatives | Feature | Modal | Replicate | RunPod | Lambda | |---------|-------|-----------|--------|--------| | Interface | Python decorators | API calls | SSH/Docker | SSH/Docker | | Setup | 3 minutes | 2 minutes | 10 minutes | 15 minutes | | Custom code | Full control | Cog format | Full control | Full control | | Auto-scaling | Yes | Yes | Manual | Manual | | Web endpoints | Built-in | No | Manual | Manual | | Cold start | ~30s | ~15s | None (always-on) | None | ## FAQ **Q: How fast is cold start?** A: ~30 seconds for first run. Warm containers respond in <1 second. Use `keep_warm=1` for always-on. **Q: Can I fine-tune models?** A: Yes, full GPU access. Run any PyTorch/HuggingFace training loop on A100/H100. **Q: How does billing work?** A: Per-second billing for GPU time. Container build time is free. No charges when idle. ## Source & Thanks > Created by [Modal](https://modal.com). > > [modal.com](https://modal.com) — Serverless GPU cloud ## 快速使用 ```bash pip install modal && modal setup ``` Python 装饰器把本地函数变成云端 GPU 任务。 ## 什么是 Modal? Serverless GPU 云平台,Python 装饰器定义云函数,自动 GPU 分配/扩缩/计费。 **一句话总结**:Serverless GPU 云,Python 装饰器 → 云端 A100/H100,自动扩缩按秒计费,无 Docker/K8s,推理/微调/批处理最简路径。 **适合人群**:需要云 GPU 但不想管基础设施的 ML 工程师。 ## 核心功能 ### 1. GPU 选择 — T4/A10G/A100/H100,多卡支持 ### 2. 无 Dockerfile — Python 代码定义容器 ### 3. Web 端点 — 一行部署 API ### 4. 按秒计费 — 无最低消费 ## 常见问题 **Q: 冷启动?** A: ~30 秒首次,预热容器 <1 秒。 ## 来源与致谢 > [modal.com](https://modal.com) — Serverless GPU 云 --- Source: https://tokrepo.com/en/workflows/a3ae2bd0-8b48-4cdd-9bb0-9c84f5272408 Author: AI Open Source