How do I install Kedro — Production-Ready ML Pipeline Framework for Python?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Kedro — Production-Ready ML Pipeline Framework for Python

Introduction

Kedro bridges the gap between messy notebook experiments and maintainable production pipelines. Created by QuantumBlack (a McKinsey company), it enforces a consistent project template, separates configuration from code, and makes pipelines reproducible without tying you to any orchestrator.

What Kedro Does

Provides a cookiecutter-style project template that standardizes ML project layout
Abstracts data access through a declarative YAML-based Data Catalog
Defines pipelines as pure Python functions connected by a DAG
Generates interactive pipeline visualizations with Kedro-Viz
Deploys to any orchestrator (Airflow, Prefect, Vertex AI, Databricks) via plugins

Architecture Overview

A Kedro project consists of nodes (Python functions), pipelines (DAGs of nodes), and a Data Catalog (YAML mapping logical dataset names to physical storage). The runner executes pipelines sequentially, in parallel threads, or delegates to external orchestrators. Configuration is layered by environment (base, local, prod) so credentials and parameters stay separate from code.

Self-Hosting & Configuration

Install via pip or conda and scaffold a project with kedro new
Define datasets in conf/base/catalog.yml pointing to local files, S3, GCS, or databases
Store credentials in conf/local/credentials.yml which is gitignored by default
Add parameters in conf/base/parameters.yml for experiment tracking
Deploy to Airflow with kedro-airflow or to Databricks with kedro-databricks plugin

Key Features

Declarative Data Catalog decouples I/O from business logic
Modular pipeline design encourages reuse across projects
Kedro-Viz provides interactive DAG visualization with experiment tracking
Built-in dataset versioning for reproducibility
Extensive plugin ecosystem for deployment, linting, and testing

Comparison with Similar Tools

Prefect — workflow orchestrator focused on scheduling; Kedro is a pipeline framework that feeds into orchestrators
DVC — data version control tool; Kedro manages pipeline structure and data access patterns
Metaflow — Netflix framework with strong compute abstraction; Kedro focuses on project structure and portability
ZenML — MLOps framework with stack abstraction; Kedro is lighter and more opinionated on project layout
Luigi — older pipeline library; Kedro offers modern packaging, catalog, and visualization

FAQ

Q: Is Kedro an orchestrator? A: No. Kedro defines pipelines; orchestrators like Airflow or Prefect schedule and monitor them. Kedro provides deployment plugins for popular orchestrators.

Q: Can I use Kedro with Jupyter notebooks? A: Yes. Kedro ships a Jupyter integration that loads the catalog and context so you can explore data interactively and then refactor into nodes.

Q: Does Kedro support distributed computing? A: Kedro nodes can use Spark, Dask, or Ray internally. The framework orchestrates the DAG; the compute engine handles scale.

Q: Who uses Kedro in production? A: Companies like Telus, QuantumBlack, Walmart, and NASA JPL use Kedro to standardize their ML workflows.

Kedro — Production-Ready ML Pipeline Framework for Python

Introduction

What Kedro Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

Miniflux — Minimalist Self-Hosted Feed Reader

Kanboard — Minimalist Kanban Project Management

Homer — Static Server Dashboard with YAML Configuration