Is Modal — Serverless GPU Cloud for AI Workloads free to use?

Yes. Modal — Serverless GPU Cloud for AI Workloads is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Modal — Serverless GPU Cloud for AI Workloads?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 8, 2026·2 min read

Modal — Serverless GPU Cloud for AI Workloads

Run GPU workloads in the cloud with Python decorators. Modal provides serverless A100/H100 GPUs for model inference, fine-tuning, and batch jobs with zero infrastructure.

AI Open Source · Community

TL;DR

Run GPU workloads in the cloud with Python decorators. Serverless A100/H100 for inference and fine-tuning.

§01

What it is

Modal is a serverless cloud platform for running GPU workloads. You write Python functions, decorate them with @app.function(gpu='A100'), and Modal handles provisioning GPU instances, installing dependencies, and scaling. There is no infrastructure to manage: no Docker files, no Kubernetes, no cloud console. Modal supports A100, H100, and T4 GPUs for model inference, fine-tuning, batch processing, and web endpoints.

ML engineers, AI researchers, and developers who need GPU compute without infrastructure management benefit from Modal. It is particularly useful for workloads that are too bursty or infrequent to justify dedicated GPU instances.

§02

How it saves time or tokens

Modal eliminates the hours spent setting up GPU infrastructure. No CUDA driver installation, no Docker image building, no autoscaling configuration. Cold start times are measured in seconds. You pay only for the GPU time you use (per-second billing), making it cost-effective for workloads that run for minutes or hours rather than continuously.

§03

How to use

Install the Modal SDK and run modal setup for one-time authentication
Write a Python function decorated with @app.function(gpu='A100')
Run modal run script.py to execute on a cloud GPU

§04

Example

import modal

app = modal.App('my-ai-app')

@app.function(gpu='A100')
def run_inference(prompt: str) -> str:
    from transformers import pipeline
    pipe = pipeline('text-generation', model='meta-llama/Llama-3-8B-Instruct')
    result = pipe(prompt, max_new_tokens=256)
    return result[0]['generated_text']

@app.local_entrypoint()
def main():
    output = run_inference.remote('Explain quantum computing.')
    print(output)

pip install modal
modal setup  # one-time auth
modal run inference.py

§05

Related on TokRepo

AI tools for coding — Browse AI development tools and platforms
Featured workflows — Discover top-rated workflows

§06

Common pitfalls

Cold starts add a few seconds on first invocation; use keep_warm=1 for latency-sensitive endpoints
Large model downloads happen on every cold start unless you use Modal's volume mounts to cache weights
GPU availability varies by type; H100s may have wait times during peak demand periods

Frequently Asked Questions

How much does Modal cost?+

Modal charges per-second for GPU usage. An A100 costs approximately $3-4/hour. There is a free tier with $30/month of compute credits. No upfront commitment or reserved instances are required.

Which GPU types does Modal support?+

Modal offers T4, A10G, L4, A100 (40GB and 80GB), and H100 GPUs. You specify the GPU type in your function decorator, and Modal provisions the right hardware automatically.

Can I deploy web endpoints on Modal?+

Yes. Use the @app.web_endpoint() decorator to deploy a function as an HTTPS endpoint. Modal handles SSL, routing, and autoscaling. Endpoints can serve model inference via REST API.

How do I cache model weights?+

Use Modal Volumes to persist model weights across invocations. Download the model once to a volume, then mount it in subsequent runs. This eliminates repeated downloads and reduces cold start time.

Can I fine-tune models on Modal?+

Yes. Modal provides the GPU compute and storage needed for fine-tuning. You write your training script in Python, specify the GPU type, and Modal handles the infrastructure. Multi-GPU training with PyTorch DDP is supported.

Citations (3)

Modal Website— Serverless GPU cloud with Python decorators
Modal Documentation— A100/H100 GPU support with per-second billing
Modal GitHub— Serverless infrastructure for model inference and fine-tuning

Related on TokRepo

AI coding tools Featured workflows Topics

🙏

Source & Thanks

Created by Modal.

modal.com — Serverless GPU cloud

Discussion

No comments yet. Be the first to share your thoughts.

Modal — Serverless GPU Cloud for AI Workloads

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

HumHub — Open-Source Enterprise Social Network

Dolibarr — Open-Source ERP & CRM for Business Management

PrestaShop — Open-Source PHP E-Commerce Platform