Introduction
Metaflow was built at Netflix to let data scientists write production ML pipelines using regular Python. It manages infrastructure concerns—versioning, compute scaling, dependency management—behind a simple decorator-based API, so teams can focus on modeling rather than plumbing.
What Metaflow Does
- Structures ML projects as flows with steps connected by a DAG
- Automatically versions every run's data, code, and dependencies
- Scales individual steps to cloud compute (AWS Batch, Kubernetes) with a single decorator
- Provides a built-in client for inspecting past runs and retrieving artifacts
- Supports branching and joining for parallel workloads within a flow
Architecture Overview
A Metaflow flow is a Python class where each method decorated with @step becomes a node in a DAG. When executed, the runtime snapshots code, data artifacts, and environment metadata for each step. Steps can be dispatched to local processes, AWS Batch, or Kubernetes. A metadata service tracks all runs, and a datastore (S3 or local filesystem) persists artifacts so any past result can be retrieved programmatically.
Self-Hosting & Configuration
- Install from PyPI for local execution with no extra infrastructure
- Configure AWS integration by running metaflow configure aws for S3 and Batch
- Deploy the metadata service for team-wide run tracking and artifact sharing
- Use @conda or @pypi decorators to pin per-step dependencies automatically
- Integrate with Argo Workflows or AWS Step Functions for production scheduling
Key Features
- Decorator-based API keeps flow definitions in plain Python without YAML or config files
- Automatic data versioning lets you inspect or compare any historical run
- @resources decorator requests specific CPU, memory, or GPU for individual steps
- Fan-out with foreach enables parallel processing across data partitions
- Built-in resume from the last successful step after failures
Comparison with Similar Tools
- Prefect — Python workflow engine; more general-purpose, less ML-specific artifact management
- Dagster — asset-centric orchestrator; stronger typing but heavier abstraction layer
- Kedro — pipeline framework for data science; more opinionated project structure
- Airflow — DAG scheduler for batch jobs; requires more infrastructure and is less Python-native
FAQ
Q: Do I need AWS to use Metaflow? A: No. Metaflow runs fully locally. AWS and Kubernetes integrations are optional for scaling.
Q: How does data versioning work? A: Every step's output artifacts are automatically persisted and tagged with the run ID. You can retrieve any artifact from any past run via the client API.
Q: Can I schedule flows for recurring execution? A: Yes. Integrate with Argo Workflows, AWS Step Functions, or any cron-based scheduler to trigger flows on a schedule.
Q: Does Metaflow handle GPU workloads? A: Yes. Use the @resources(gpu=1) decorator to request GPU instances for specific steps.