Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsApr 8, 2026·2 min de lecture

Modal — Serverless GPU Cloud for AI Workloads

Run GPU workloads in the cloud with Python decorators. Modal provides serverless A100/H100 GPUs for model inference, fine-tuning, and batch jobs with zero infrastructure.

What is Modal?

Modal is a serverless GPU cloud platform where you define cloud functions with Python decorators. Add @app.function(gpu="A100") to any function and it runs on cloud GPUs — no Docker, no Kubernetes, no SSH. Modal handles container building, GPU provisioning, scaling, and shutdown automatically. Pay per second of compute.

Answer-Ready: Modal is a serverless GPU cloud for AI. Python decorators turn local functions into cloud GPU jobs. A100/H100 GPUs, auto-scaling, per-second billing. No Docker or K8s needed. Used for inference, fine-tuning, and batch processing. The simplest path from laptop to cloud GPU.

Best for: ML engineers needing cloud GPUs without infrastructure hassle. Works with: Any Python ML library, PyTorch, HuggingFace, vLLM. Setup time: Under 3 minutes.

Core Features

1. GPU Selection

@app.function(gpu="T4")       # Budget inference
@app.function(gpu="A10G")     # Mid-range
@app.function(gpu="A100")     # Standard training/inference
@app.function(gpu="H100")     # Maximum performance
@app.function(gpu="A100:4")   # Multi-GPU

2. Container Definition (No Dockerfile)

image = (
    modal.Image.debian_slim()
    .pip_install("torch", "transformers", "accelerate")
    .run_commands("apt-get install -y ffmpeg")
)

@app.function(image=image, gpu="A100")
def train():
    ...

3. Web Endpoints

@app.function(gpu="A100")
@modal.web_endpoint()
def generate(prompt: str):
    return {"text": run_model(prompt)}

# Deployed at: https://your-app--generate.modal.run

4. Scheduled Jobs

@app.function(schedule=modal.Cron("0 */6 * * *"))
def batch_process():
    # Runs every 6 hours
    ...

5. Volumes (Persistent Storage)

vol = modal.Volume.from_name("model-cache", create_if_missing=True)

@app.function(volumes={"/models": vol}, gpu="A100")
def inference():
    # Models cached across runs
    model = load_model("/models/llama-3.1")

Pricing

GPU Price/hour Best For
T4 $0.59 Light inference
A10G $1.10 Medium workloads
A100 40GB $3.72 Training/inference
A100 80GB $4.58 Large models
H100 $6.98 Maximum speed

Per-second billing. No minimum.

Modal vs Alternatives

Feature Modal Replicate RunPod Lambda
Interface Python decorators API calls SSH/Docker SSH/Docker
Setup 3 minutes 2 minutes 10 minutes 15 minutes
Custom code Full control Cog format Full control Full control
Auto-scaling Yes Yes Manual Manual
Web endpoints Built-in No Manual Manual
Cold start ~30s ~15s None (always-on) None

FAQ

Q: How fast is cold start? A: ~30 seconds for first run. Warm containers respond in <1 second. Use keep_warm=1 for always-on.

Q: Can I fine-tune models? A: Yes, full GPU access. Run any PyTorch/HuggingFace training loop on A100/H100.

Q: How does billing work? A: Per-second billing for GPU time. Container build time is free. No charges when idle.

🙏

Source et remerciements

Created by Modal.

modal.com — Serverless GPU cloud

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires