Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsApr 30, 2026·3 min de lecture

Papermill — Parameterize and Execute Jupyter Notebooks

Papermill is a Python tool for parameterizing, executing, and analyzing Jupyter notebooks programmatically, enabling notebook-based pipelines and report generation.

Introduction

Papermill is a tool for parameterizing and executing Jupyter notebooks from the command line or Python code. It treats notebooks as functions that accept parameters and produce output notebooks with results, making it possible to build reproducible data pipelines, automated reports, and batch experiments using familiar notebook workflows.

What Papermill Does

  • Executes Jupyter notebooks with injected parameters from CLI or Python API
  • Produces output notebooks containing cell outputs, errors, and execution metadata
  • Supports reading and writing notebooks from local disk, S3, GCS, Azure Blob, and HDFS
  • Records execution duration and status for each cell in the output notebook
  • Integrates with workflow orchestrators like Airflow, Dagster, and Prefect

Architecture Overview

Papermill reads an input notebook, locates a cell tagged with the "parameters" tag, and injects a new cell immediately after it with the provided parameter values. It then executes the entire notebook using the configured Jupyter kernel (Python, R, Julia, Scala, etc.) via nbclient. Each cell's output is captured and written to the output notebook file. The storage layer uses pluggable I/O handlers, allowing notebooks to be read from and written to cloud object stores. Error handling can be configured to raise exceptions on cell failures or continue execution.

Self-Hosting & Configuration

  • Install via pip with optional cloud storage extras: pip install papermill[s3,gcs,azure]
  • Tag a notebook cell as "parameters" using the Jupyter cell toolbar to mark injection point
  • Configure kernel name with -k flag to execute with non-default kernels
  • Set execution timeout per cell with --request-save-on-cell-execute for long-running jobs
  • Use environment variables or YAML files for parameter sets in batch execution

Key Features

  • CLI and Python API for flexible integration into scripts and pipelines
  • Cloud-native storage support for S3, GCS, Azure Blob, and HDFS
  • Works with any Jupyter kernel including Python, R, Julia, and Scala
  • Captures rich cell outputs (tables, charts, HTML) in the output notebook
  • Pairs with scrapbook library for extracting data and figures from executed notebooks

Comparison with Similar Tools

  • nbconvert — converts notebooks to HTML/PDF but does not parameterize or re-execute
  • Ploomber — notebook pipeline orchestrator with DAG support, broader scope
  • Dagstermill — Dagster integration for notebooks, uses Papermill under the hood
  • Jupyter Scheduler — JupyterLab extension for scheduled runs, less programmatic control
  • Marimo — reactive notebook format, different paradigm from traditional Jupyter notebooks

FAQ

Q: What notebook formats does Papermill support? A: Papermill works with standard .ipynb files (nbformat v4). Any notebook compatible with Jupyter is supported.

Q: Can I run Papermill in a CI/CD pipeline? A: Yes. Papermill is commonly used in CI/CD for automated report generation and notebook testing. It returns a non-zero exit code on cell execution failures.

Q: How do I pass complex parameters like lists or dicts? A: Use the -y flag with YAML strings or -f with a YAML parameter file for complex data types.

Q: Does Papermill support parallel execution? A: Papermill executes one notebook at a time. For parallel execution, use a workflow orchestrator like Airflow or run multiple Papermill processes.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires