ScriptsApr 24, 2026·3 min read

Metaflow — Human-Friendly ML Workflow Framework by Netflix

Metaflow is a Python framework from Netflix for building and managing real-life data science and ML projects, handling compute, data versioning, and orchestration with minimal boilerplate.

Introduction

Metaflow was built at Netflix to let data scientists write production ML pipelines using regular Python. It manages infrastructure concerns—versioning, compute scaling, dependency management—behind a simple decorator-based API, so teams can focus on modeling rather than plumbing.

What Metaflow Does

  • Structures ML projects as flows with steps connected by a DAG
  • Automatically versions every run's data, code, and dependencies
  • Scales individual steps to cloud compute (AWS Batch, Kubernetes) with a single decorator
  • Provides a built-in client for inspecting past runs and retrieving artifacts
  • Supports branching and joining for parallel workloads within a flow

Architecture Overview

A Metaflow flow is a Python class where each method decorated with @step becomes a node in a DAG. When executed, the runtime snapshots code, data artifacts, and environment metadata for each step. Steps can be dispatched to local processes, AWS Batch, or Kubernetes. A metadata service tracks all runs, and a datastore (S3 or local filesystem) persists artifacts so any past result can be retrieved programmatically.

Self-Hosting & Configuration

  • Install from PyPI for local execution with no extra infrastructure
  • Configure AWS integration by running metaflow configure aws for S3 and Batch
  • Deploy the metadata service for team-wide run tracking and artifact sharing
  • Use @conda or @pypi decorators to pin per-step dependencies automatically
  • Integrate with Argo Workflows or AWS Step Functions for production scheduling

Key Features

  • Decorator-based API keeps flow definitions in plain Python without YAML or config files
  • Automatic data versioning lets you inspect or compare any historical run
  • @resources decorator requests specific CPU, memory, or GPU for individual steps
  • Fan-out with foreach enables parallel processing across data partitions
  • Built-in resume from the last successful step after failures

Comparison with Similar Tools

  • Prefect — Python workflow engine; more general-purpose, less ML-specific artifact management
  • Dagster — asset-centric orchestrator; stronger typing but heavier abstraction layer
  • Kedro — pipeline framework for data science; more opinionated project structure
  • Airflow — DAG scheduler for batch jobs; requires more infrastructure and is less Python-native

FAQ

Q: Do I need AWS to use Metaflow? A: No. Metaflow runs fully locally. AWS and Kubernetes integrations are optional for scaling.

Q: How does data versioning work? A: Every step's output artifacts are automatically persisted and tagged with the run ID. You can retrieve any artifact from any past run via the client API.

Q: Can I schedule flows for recurring execution? A: Yes. Integrate with Argo Workflows, AWS Step Functions, or any cron-based scheduler to trigger flows on a schedule.

Q: Does Metaflow handle GPU workloads? A: Yes. Use the @resources(gpu=1) decorator to request GPU instances for specific steps.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets