# Modal Volumes — Persistent Storage for AI Workloads > Modal Volumes are content-addressed persistent storage for Modal. Mount as a Linux filesystem, share across runs, version with .commit(). ## Install Copy the content below into your project: ## Quick Use 1. `pip install modal` 2. `volume = modal.Volume.from_name("my-vol", create_if_missing=True)` 3. Mount in any Function or Sandbox via `volumes={"/path": volume}`, call `.commit()` to checkpoint --- ## Intro Modal Volumes give your Modal Functions and Sandboxes persistent storage that survives across runs. Mount as a regular Linux filesystem, share between Functions and Sandboxes, version checkpoints with `.commit()`. Best for: caching pip packages and model weights, ML training datasets, agent workspaces that span multiple runs. Works with: Modal SDK. Setup time: 1 minute. --- ### Create + mount ```python import modal app = modal.App("ml-pipeline") volume = modal.Volume.from_name("model-cache", create_if_missing=True) @app.function(volumes={"/cache": volume}) def download_model(): import huggingface_hub huggingface_hub.snapshot_download( "meta-llama/Llama-3-8B", cache_dir="/cache", ) volume.commit() # checkpoint to durable storage ``` ### Cache pip packages across runs ```python pip_cache = modal.Volume.from_name("pip-cache", create_if_missing=True) image = modal.Image.debian_slim().run_commands( "pip install --target=/pip-cache transformers torch", ).env({"PYTHONPATH": "/pip-cache"}) @app.function(image=image, volumes={"/pip-cache": pip_cache}) def train(): ... ``` Cold starts drop from minutes to seconds because packages are already on the volume. ### Share data between Function and Sandbox ```python data_vol = modal.Volume.from_name("ingest", create_if_missing=True) @app.function(volumes={"/data": data_vol}) def ingest(): download_csv_files("/data/raw/") data_vol.commit() @app.function(volumes={"/data": data_vol}) def analyze(): data_vol.reload() # pick up the latest committed snapshot return run_analysis("/data/raw/") ``` ### Use as agent workspace ```python workspace = modal.Volume.from_name("agent-runs", create_if_missing=True) # Each agent run gets its own subdirectory sb = modal.Sandbox.create( image=image, volumes={"/work": workspace}, workdir="/work/run-{run_id}", ) sb.exec("python", "agent.py") workspace.commit() # save the work ``` Later runs read the workspace history; the agent has long-term memory across sessions. --- ### FAQ **Q: How is Volume different from S3?** A: Volumes mount as a Linux filesystem (no S3 API translation). Faster random reads, integrated with Modal's image cache. For purely archival or external access, S3 is still the right choice. **Q: Are Volumes versioned?** A: Yes — every `.commit()` creates a new version. You can read prior versions with `volume.reload(commit_id=...)`. Useful for rollbacks and reproducibility. **Q: How big can a Volume be?** A: Practically unlimited. The hot tier is fast (NVMe-class); colder data is offloaded transparently. You're billed for storage used, but the limit isn't hard. --- ## Source & Thanks > Built by [Modal](https://github.com/modal-labs). Commercial product with free tier. > > [modal.com/docs](https://modal.com/docs/guide/volumes) — Volumes documentation --- ## 快速使用 1. `pip install modal` 2. `volume = modal.Volume.from_name("my-vol", create_if_missing=True)` 3. 在任何 Function 或 Sandbox 里用 `volumes={"/path": volume}` 挂载,调 `.commit()` 做 checkpoint --- ## 简介 Modal Volume 给 Modal Function 和 Sandbox 跨运行保留的持久存储。挂载成普通 Linux 文件系统、跨 Function / Sandbox 共享、用 `.commit()` 做版本 checkpoint。适合缓存 pip 包和模型权重、ML 训练数据集、跨多次运行的 agent 工作区。需要 Modal SDK。装机时间 1 分钟。 --- ### 创建 + 挂载 ```python import modal app = modal.App("ml-pipeline") volume = modal.Volume.from_name("model-cache", create_if_missing=True) @app.function(volumes={"/cache": volume}) def download_model(): import huggingface_hub huggingface_hub.snapshot_download( "meta-llama/Llama-3-8B", cache_dir="/cache", ) volume.commit() # checkpoint 到持久存储 ``` ### 跨运行缓存 pip 包 ```python pip_cache = modal.Volume.from_name("pip-cache", create_if_missing=True) image = modal.Image.debian_slim().run_commands( "pip install --target=/pip-cache transformers torch", ).env({"PYTHONPATH": "/pip-cache"}) @app.function(image=image, volumes={"/pip-cache": pip_cache}) def train(): ... ``` 冷启动从几分钟降到几秒,因为包已经在 volume 上了。 ### Function 和 Sandbox 之间共享数据 ```python data_vol = modal.Volume.from_name("ingest", create_if_missing=True) @app.function(volumes={"/data": data_vol}) def ingest(): download_csv_files("/data/raw/") data_vol.commit() @app.function(volumes={"/data": data_vol}) def analyze(): data_vol.reload() # 拿最新 commit 的快照 return run_analysis("/data/raw/") ``` ### 当 agent 工作区用 ```python workspace = modal.Volume.from_name("agent-runs", create_if_missing=True) # 每次 agent 运行有自己的子目录 sb = modal.Sandbox.create( image=image, volumes={"/work": workspace}, workdir="/work/run-{run_id}", ) sb.exec("python", "agent.py") workspace.commit() # 保存工作 ``` 后续运行读工作区历史,agent 有跨会话的长期记忆。 --- ### FAQ **Q: Volume 跟 S3 啥区别?** A: Volume 挂成 Linux 文件系统(不用 S3 API 转换)。随机读更快,跟 Modal 镜像缓存集成。纯归档或外部访问 S3 还是更合适。 **Q: Volume 有版本吗?** A: 有 —— 每次 `.commit()` 创建一个新版本。可以用 `volume.reload(commit_id=...)` 读旧版本。回滚和可复现有用。 **Q: Volume 能多大?** A: 实际上无上限。热层快(NVMe 级),冷数据透明卸载。按使用容量计费,但没有硬上限。 --- ## 来源与感谢 > Built by [Modal](https://github.com/modal-labs). Commercial product with free tier. > > [modal.com/docs](https://modal.com/docs/guide/volumes) — Volumes documentation --- Source: https://tokrepo.com/en/workflows/modal-volumes-persistent-storage-for-ai-workloads Author: Modal