What is Modal?
Modal is a serverless GPU cloud platform where you define cloud functions with Python decorators. Add @app.function(gpu="A100") to any function and it runs on cloud GPUs — no Docker, no Kubernetes, no SSH. Modal handles container building, GPU provisioning, scaling, and shutdown automatically. Pay per second of compute.
Answer-Ready: Modal is a serverless GPU cloud for AI. Python decorators turn local functions into cloud GPU jobs. A100/H100 GPUs, auto-scaling, per-second billing. No Docker or K8s needed. Used for inference, fine-tuning, and batch processing. The simplest path from laptop to cloud GPU.
Best for: ML engineers needing cloud GPUs without infrastructure hassle. Works with: Any Python ML library, PyTorch, HuggingFace, vLLM. Setup time: Under 3 minutes.
Core Features
1. GPU Selection
@app.function(gpu="T4") # Budget inference
@app.function(gpu="A10G") # Mid-range
@app.function(gpu="A100") # Standard training/inference
@app.function(gpu="H100") # Maximum performance
@app.function(gpu="A100:4") # Multi-GPU2. Container Definition (No Dockerfile)
image = (
modal.Image.debian_slim()
.pip_install("torch", "transformers", "accelerate")
.run_commands("apt-get install -y ffmpeg")
)
@app.function(image=image, gpu="A100")
def train():
...3. Web Endpoints
@app.function(gpu="A100")
@modal.web_endpoint()
def generate(prompt: str):
return {"text": run_model(prompt)}
# Deployed at: https://your-app--generate.modal.run4. Scheduled Jobs
@app.function(schedule=modal.Cron("0 */6 * * *"))
def batch_process():
# Runs every 6 hours
...5. Volumes (Persistent Storage)
vol = modal.Volume.from_name("model-cache", create_if_missing=True)
@app.function(volumes={"/models": vol}, gpu="A100")
def inference():
# Models cached across runs
model = load_model("/models/llama-3.1")Pricing
| GPU | Price/hour | Best For |
|---|---|---|
| T4 | $0.59 | Light inference |
| A10G | $1.10 | Medium workloads |
| A100 40GB | $3.72 | Training/inference |
| A100 80GB | $4.58 | Large models |
| H100 | $6.98 | Maximum speed |
Per-second billing. No minimum.
Modal vs Alternatives
| Feature | Modal | Replicate | RunPod | Lambda |
|---|---|---|---|---|
| Interface | Python decorators | API calls | SSH/Docker | SSH/Docker |
| Setup | 3 minutes | 2 minutes | 10 minutes | 15 minutes |
| Custom code | Full control | Cog format | Full control | Full control |
| Auto-scaling | Yes | Yes | Manual | Manual |
| Web endpoints | Built-in | No | Manual | Manual |
| Cold start | ~30s | ~15s | None (always-on) | None |
FAQ
Q: How fast is cold start?
A: ~30 seconds for first run. Warm containers respond in <1 second. Use keep_warm=1 for always-on.
Q: Can I fine-tune models? A: Yes, full GPU access. Run any PyTorch/HuggingFace training loop on A100/H100.
Q: How does billing work? A: Per-second billing for GPU time. Container build time is free. No charges when idle.