Modal — Serverless GPU Cloud for AI Workloads
Run GPU workloads in the cloud with Python decorators. Modal provides serverless A100/H100 GPUs for model inference, fine-tuning, and batch jobs with zero infrastructure.
What it is
Modal is a serverless cloud platform for running GPU workloads. You write Python functions, decorate them with @app.function(gpu='A100'), and Modal handles provisioning GPU instances, installing dependencies, and scaling. There is no infrastructure to manage: no Docker files, no Kubernetes, no cloud console. Modal supports A100, H100, and T4 GPUs for model inference, fine-tuning, batch processing, and web endpoints.
ML engineers, AI researchers, and developers who need GPU compute without infrastructure management benefit from Modal. It is particularly useful for workloads that are too bursty or infrequent to justify dedicated GPU instances.
How it saves time or tokens
Modal eliminates the hours spent setting up GPU infrastructure. No CUDA driver installation, no Docker image building, no autoscaling configuration. Cold start times are measured in seconds. You pay only for the GPU time you use (per-second billing), making it cost-effective for workloads that run for minutes or hours rather than continuously.
How to use
- Install the Modal SDK and run
modal setupfor one-time authentication - Write a Python function decorated with
@app.function(gpu='A100') - Run
modal run script.pyto execute on a cloud GPU
Example
import modal
app = modal.App('my-ai-app')
@app.function(gpu='A100')
def run_inference(prompt: str) -> str:
from transformers import pipeline
pipe = pipeline('text-generation', model='meta-llama/Llama-3-8B-Instruct')
result = pipe(prompt, max_new_tokens=256)
return result[0]['generated_text']
@app.local_entrypoint()
def main():
output = run_inference.remote('Explain quantum computing.')
print(output)
pip install modal
modal setup # one-time auth
modal run inference.py
Related on TokRepo
- AI tools for coding — Browse AI development tools and platforms
- Featured workflows — Discover top-rated workflows
Common pitfalls
- Cold starts add a few seconds on first invocation; use
keep_warm=1for latency-sensitive endpoints - Large model downloads happen on every cold start unless you use Modal's volume mounts to cache weights
- GPU availability varies by type; H100s may have wait times during peak demand periods
Frequently Asked Questions
Modal charges per-second for GPU usage. An A100 costs approximately $3-4/hour. There is a free tier with $30/month of compute credits. No upfront commitment or reserved instances are required.
Modal offers T4, A10G, L4, A100 (40GB and 80GB), and H100 GPUs. You specify the GPU type in your function decorator, and Modal provisions the right hardware automatically.
Yes. Use the @app.web_endpoint() decorator to deploy a function as an HTTPS endpoint. Modal handles SSL, routing, and autoscaling. Endpoints can serve model inference via REST API.
Use Modal Volumes to persist model weights across invocations. Download the model once to a volume, then mount it in subsequent runs. This eliminates repeated downloads and reduces cold start time.
Yes. Modal provides the GPU compute and storage needed for fine-tuning. You write your training script in Python, specify the GPU type, and Modal handles the infrastructure. Multi-GPU training with PyTorch DDP is supported.
Citations (3)
- Modal Website— Serverless GPU cloud with Python decorators
- Modal Documentation— A100/H100 GPU support with per-second billing
- Modal GitHub— Serverless infrastructure for model inference and fine-tuning
Related on TokRepo
Source & Thanks
Discussion
Related Assets
HumHub — Open-Source Enterprise Social Network
A flexible, open-source social networking platform built on Yii2 for creating private communities, intranets, and collaboration spaces within organizations.
Dolibarr — Open-Source ERP & CRM for Business Management
A modular open-source ERP and CRM application written in PHP for managing contacts, invoices, orders, inventory, accounting, and more from a single web interface.
PrestaShop — Open-Source PHP E-Commerce Platform
A widely adopted open-source e-commerce platform written in PHP with a rich module marketplace, multi-language support, and a strong European user base.