# Modal — Serverless GPU Cloud for AI Workloads

> Run GPU workloads in the cloud with Python decorators. Modal provides serverless A100/H100 GPUs for model inference, fine-tuning, and batch jobs with zero infrastructure.

## Install

Save as a script file and run:

## Quick Use

```bash
pip install modal
modal setup  # One-time auth
```

```python
import modal

app = modal.App("my-ai-app")

@app.function(gpu="A100")
def run_inference(prompt: str) -> str:
    from transformers import pipeline
    pipe = pipeline("text-generation", model="meta-llama/Llama-3.1-8B-Instruct", device="cuda")
    return pipe(prompt, max_new_tokens=256)[0]["generated_text"]

@app.local_entrypoint()
def main():
    result = run_inference.remote("Explain quantum computing")
    print(result)
```

```bash
modal run my_app.py
```

## What is Modal?

Modal is a serverless GPU cloud platform where you define cloud functions with Python decorators. Add `@app.function(gpu="A100")` to any function and it runs on cloud GPUs — no Docker, no Kubernetes, no SSH. Modal handles container building, GPU provisioning, scaling, and shutdown automatically. Pay per second of compute.

**Answer-Ready**: Modal is a serverless GPU cloud for AI. Python decorators turn local functions into cloud GPU jobs. A100/H100 GPUs, auto-scaling, per-second billing. No Docker or K8s needed. Used for inference, fine-tuning, and batch processing. The simplest path from laptop to cloud GPU.

**Best for**: ML engineers needing cloud GPUs without infrastructure hassle. **Works with**: Any Python ML library, PyTorch, HuggingFace, vLLM. **Setup time**: Under 3 minutes.

## Core Features

### 1. GPU Selection

```python
@app.function(gpu="T4")       # Budget inference
@app.function(gpu="A10G")     # Mid-range
@app.function(gpu="A100")     # Standard training/inference
@app.function(gpu="H100")     # Maximum performance
@app.function(gpu="A100:4")   # Multi-GPU
```

### 2. Container Definition (No Dockerfile)

```python
image = (
    modal.Image.debian_slim()
    .pip_install("torch", "transformers", "accelerate")
    .run_commands("apt-get install -y ffmpeg")
)

@app.function(image=image, gpu="A100")
def train():
    ...
```

### 3. Web Endpoints

```python
@app.function(gpu="A100")
@modal.web_endpoint()
def generate(prompt: str):
    return {"text": run_model(prompt)}

# Deployed at: https://your-app--generate.modal.run
```

### 4. Scheduled Jobs

```python
@app.function(schedule=modal.Cron("0 */6 * * *"))
def batch_process():
    # Runs every 6 hours
    ...
```

### 5. Volumes (Persistent Storage)

```python
vol = modal.Volume.from_name("model-cache", create_if_missing=True)

@app.function(volumes={"/models": vol}, gpu="A100")
def inference():
    # Models cached across runs
    model = load_model("/models/llama-3.1")
```

## Pricing

| GPU | Price/hour | Best For |
|-----|-----------|----------|
| T4 | $0.59 | Light inference |
| A10G | $1.10 | Medium workloads |
| A100 40GB | $3.72 | Training/inference |
| A100 80GB | $4.58 | Large models |
| H100 | $6.98 | Maximum speed |

Per-second billing. No minimum.

## Modal vs Alternatives

| Feature | Modal | Replicate | RunPod | Lambda |
|---------|-------|-----------|--------|--------|
| Interface | Python decorators | API calls | SSH/Docker | SSH/Docker |
| Setup | 3 minutes | 2 minutes | 10 minutes | 15 minutes |
| Custom code | Full control | Cog format | Full control | Full control |
| Auto-scaling | Yes | Yes | Manual | Manual |
| Web endpoints | Built-in | No | Manual | Manual |
| Cold start | ~30s | ~15s | None (always-on) | None |

## FAQ

**Q: How fast is cold start?**
A: ~30 seconds for first run. Warm containers respond in <1 second. Use `keep_warm=1` for always-on.

**Q: Can I fine-tune models?**
A: Yes, full GPU access. Run any PyTorch/HuggingFace training loop on A100/H100.

**Q: How does billing work?**
A: Per-second billing for GPU time. Container build time is free. No charges when idle.

## Source & Thanks

> Created by [Modal](https://modal.com).
>
> [modal.com](https://modal.com) — Serverless GPU cloud

<!-- ZH -->

## 快速使用

```bash
pip install modal && modal setup
```

Python 装饰器把本地函数变成云端 GPU 任务。

## 什么是 Modal？

Serverless GPU 云平台，Python 装饰器定义云函数，自动 GPU 分配/扩缩/计费。

**一句话总结**：Serverless GPU 云，Python 装饰器 → 云端 A100/H100，自动扩缩按秒计费，无 Docker/K8s，推理/微调/批处理最简路径。

**适合人群**：需要云 GPU 但不想管基础设施的 ML 工程师。

## 核心功能

### 1. GPU 选择 — T4/A10G/A100/H100，多卡支持
### 2. 无 Dockerfile — Python 代码定义容器
### 3. Web 端点 — 一行部署 API
### 4. 按秒计费 — 无最低消费

## 常见问题

**Q: 冷启动？**
A: ~30 秒首次，预热容器 <1 秒。

## 来源与致谢

> [modal.com](https://modal.com) — Serverless GPU 云

---
Source: https://tokrepo.com/en/workflows/a3ae2bd0-8b48-4cdd-9bb0-9c84f5272408
Author: AI Open Source