Cette page est affichée en anglais. Une traduction française est en cours.

SkillsApr 16, 2026·3 min de lecture

Kedro — Production-Ready ML Pipeline Framework for Python

Kedro is an open-source Python framework by McKinsey QuantumBlack that applies software engineering best practices to data science and ML pipelines. It provides a standardized project structure, data catalog, and pipeline abstraction that makes experimental code production-ready.

AI Open Source · Community

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Established

Point d'entrée

Kedro Overview

Commande d'installation directe

npx -y tokrepo@latest install a9468d16-39eb-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

TL;DR

Open-source Python framework by QuantumBlack that turns messy notebook code into production-ready ML pipelines.

§01

What it is

Kedro is an open-source Python framework created by McKinsey QuantumBlack that applies software engineering best practices to data science and machine learning code. It provides a standardized project structure, a declarative data catalog, and a pipeline abstraction that transforms experimental notebook code into maintainable, testable, production-ready pipelines.

Kedro targets data scientists and ML engineers who need to bridge the gap between prototype notebooks and production systems. It works alongside existing tools like pandas, scikit-learn, and PySpark without replacing them.

§02

How it saves time or tokens

Kedro eliminates the 'notebook to production' refactoring cycle. The standardized project template means new team members understand the codebase layout immediately. The data catalog decouples data access from business logic, so switching between local CSV files and cloud storage requires changing a YAML config, not Python code. Pipeline visualization with kedro viz provides instant documentation of data flow without writing diagram code.

§03

How to use

Install Kedro and create a new project:

pip install kedro
kedro new --starter=spaceflights-pandas
cd spaceflights-pandas

Run the pipeline:

kedro run

Visualize the pipeline graph:

kedro viz run
# Opens browser at localhost:4141

§04

Example

Define a pipeline node that transforms data:

# src/project/pipelines/data_processing/nodes.py
import pandas as pd

def preprocess_companies(companies: pd.DataFrame) -> pd.DataFrame:
    companies['company_rating'] = companies['company_rating'].fillna(
        companies['company_rating'].mean()
    )
    return companies

# src/project/pipelines/data_processing/pipeline.py
from kedro.pipeline import Pipeline, node
from .nodes import preprocess_companies

def create_pipeline(**kwargs) -> Pipeline:
    return Pipeline([
        node(
            func=preprocess_companies,
            inputs='companies',
            outputs='preprocessed_companies',
            name='preprocess_companies_node',
        ),
    ])

The data catalog YAML maps logical names to physical storage.

§05

Related on TokRepo

AI Tools for Coding — Development tools that complement ML pipeline frameworks
AI Tools for DevOps — CI/CD and deployment tools for ML pipeline orchestration

§06

Common pitfalls

Kedro is a pipeline framework, not an orchestrator. For scheduled execution, pair it with Airflow, Prefect, or Argo using Kedro's deployment plugins.
The data catalog requires explicit registration of every dataset. Forgetting to add an intermediate dataset to catalog.yml causes runtime errors.
Pipeline visualization with kedro viz requires installing the kedro-viz plugin separately. It is not included in the core package.
Always check the official documentation for the latest version-specific changes and migration guides before upgrading in production environments.
For team deployments, establish clear guidelines on configuration and usage patterns to ensure consistency across developers.

Questions fréquentes

What is the difference between Kedro and Airflow?+

Kedro is a pipeline authoring framework focused on code organization, data management, and reproducibility. Airflow is a workflow orchestrator focused on scheduling and monitoring. They complement each other: you write pipelines in Kedro and deploy them to Airflow for scheduled execution.

Does Kedro work with PySpark?+

Yes. Kedro has built-in support for PySpark through its data catalog. You define SparkDataSet entries in catalog.yml, and your pipeline nodes receive and return Spark DataFrames. This lets you scale from pandas prototypes to Spark production without changing node logic.

Who maintains Kedro?+

Kedro is maintained by McKinsey QuantumBlack, McKinsey's AI and data science division. It was originally built as an internal tool and open-sourced for the broader data science community.

Can I use Kedro for non-ML data pipelines?+

Yes. While Kedro was designed for ML workflows, its pipeline and data catalog abstractions work for any data processing task. ETL pipelines, reporting pipelines, and data quality checks all fit the Kedro model.

How does the Kedro data catalog work?+

The data catalog is a YAML file that maps logical dataset names to physical storage locations and formats. Your pipeline code references logical names only. Switching from a local CSV to S3 Parquet requires changing the catalog entry, not your Python code.

Sources citées (3)

Kedro GitHub— Kedro is an open-source Python framework by McKinsey QuantumBlack
Kedro Documentation— Standardized project structure with data catalog and pipeline visualization
Kedro Deployment Docs— Integration with Airflow, Prefect, and other orchestrators

En lien sur TokRepo

AI coding tools DevOps tools Featured workflows

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

ZenML — MLOps Pipeline Framework from Development to Production

An open-source MLOps framework that lets you build portable, production-ready ML pipelines that run on any infrastructure stack.

Skills

Script Depot

Dropwizard — Production-Ready Java REST Framework

An opinionated Java framework that bundles Jetty, Jersey, Jackson, and Metrics into a single package for building RESTful web services.

Skills

AI Open Source

PyTorch — The Deep Learning Framework for Research and Production

PyTorch is an open-source deep learning framework by Meta that provides tensor computation with GPU acceleration and automatic differentiation. Its dynamic computation graph and Pythonic API make it the dominant framework for AI research and increasingly for production.

Skills

Script Depot

Pydantic AI — Production AI Agent Framework

Build production-ready AI agents in Python with type-safe structured outputs, dependency injection, and multi-model support. By the creators of Pydantic.

Skills

Pydantic