What is Modal — Serverless GPU Cloud for AI Workloads?

Run GPU workloads in the cloud with Python decorators. Modal provides serverless A100/H100 GPUs for model inference, fine-tuning, and batch jobs with zero infrastructure.

Is Modal — Serverless GPU Cloud for AI Workloads free to use?

Yes. Modal — Serverless GPU Cloud for AI Workloads is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Modal — Serverless GPU Cloud for AI Workloads?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Modal — Serverless GPU Cloud for AI Workloads

What is Modal?

Modal is a serverless GPU cloud platform where you define cloud functions with Python decorators. Add @app.function(gpu="A100") to any function and it runs on cloud GPUs — no Docker, no Kubernetes, no SSH. Modal handles container building, GPU provisioning, scaling, and shutdown automatically. Pay per second of compute.

Answer-Ready: Modal is a serverless GPU cloud for AI. Python decorators turn local functions into cloud GPU jobs. A100/H100 GPUs, auto-scaling, per-second billing. No Docker or K8s needed. Used for inference, fine-tuning, and batch processing. The simplest path from laptop to cloud GPU.

Best for: ML engineers needing cloud GPUs without infrastructure hassle. Works with: Any Python ML library, PyTorch, HuggingFace, vLLM. Setup time: Under 3 minutes.

Core Features

1. GPU Selection

@app.function(gpu="T4")       # Budget inference
@app.function(gpu="A10G")     # Mid-range
@app.function(gpu="A100")     # Standard training/inference
@app.function(gpu="H100")     # Maximum performance
@app.function(gpu="A100:4")   # Multi-GPU

2. Container Definition (No Dockerfile)

image = (
    modal.Image.debian_slim()
    .pip_install("torch", "transformers", "accelerate")
    .run_commands("apt-get install -y ffmpeg")
)

@app.function(image=image, gpu="A100")
def train():
    ...

3. Web Endpoints

@app.function(gpu="A100")
@modal.web_endpoint()
def generate(prompt: str):
    return {"text": run_model(prompt)}

# Deployed at: https://your-app--generate.modal.run

4. Scheduled Jobs

@app.function(schedule=modal.Cron("0 */6 * * *"))
def batch_process():
    # Runs every 6 hours
    ...

5. Volumes (Persistent Storage)

vol = modal.Volume.from_name("model-cache", create_if_missing=True)

@app.function(volumes={"/models": vol}, gpu="A100")
def inference():
    # Models cached across runs
    model = load_model("/models/llama-3.1")

Pricing

GPU	Price/hour	Best For
T4	$0.59	Light inference
A10G	$1.10	Medium workloads
A100 40GB	$3.72	Training/inference
A100 80GB	$4.58	Large models
H100	$6.98	Maximum speed

Per-second billing. No minimum.

Modal vs Alternatives

Feature	Modal	Replicate	RunPod	Lambda
Interface	Python decorators	API calls	SSH/Docker	SSH/Docker
Setup	3 minutes	2 minutes	10 minutes	15 minutes
Custom code	Full control	Cog format	Full control	Full control
Auto-scaling	Yes	Yes	Manual	Manual
Web endpoints	Built-in	No	Manual	Manual
Cold start	~30s	~15s	None (always-on)	None

FAQ

Q: How fast is cold start? A: ~30 seconds for first run. Warm containers respond in <1 second. Use keep_warm=1 for always-on.

Q: Can I fine-tune models? A: Yes, full GPU access. Run any PyTorch/HuggingFace training loop on A100/H100.

Q: How does billing work? A: Per-second billing for GPU time. Container build time is free. No charges when idle.

Modal — Serverless GPU Cloud for AI Workloads

Use it first, then decide how deep to go

What is Modal?

Core Features

1. GPU Selection

2. Container Definition (No Dockerfile)

3. Web Endpoints

4. Scheduled Jobs

5. Volumes (Persistent Storage)

Pricing

Modal vs Alternatives

FAQ

Source & Thanks

Discussion

Related Assets

Replicate — Run AI Models via Simple API Calls

Pinecone — Managed Vector Database for Production AI