How do I install Papermill — Parameterize and Execute Jupyter Notebooks?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Papermill — Parameterize and Execute Jupyter Notebooks

Introduction

Papermill is a tool for parameterizing and executing Jupyter notebooks from the command line or Python code. It treats notebooks as functions that accept parameters and produce output notebooks with results, making it possible to build reproducible data pipelines, automated reports, and batch experiments using familiar notebook workflows.

What Papermill Does

Executes Jupyter notebooks with injected parameters from CLI or Python API
Produces output notebooks containing cell outputs, errors, and execution metadata
Supports reading and writing notebooks from local disk, S3, GCS, Azure Blob, and HDFS
Records execution duration and status for each cell in the output notebook
Integrates with workflow orchestrators like Airflow, Dagster, and Prefect

Architecture Overview

Papermill reads an input notebook, locates a cell tagged with the "parameters" tag, and injects a new cell immediately after it with the provided parameter values. It then executes the entire notebook using the configured Jupyter kernel (Python, R, Julia, Scala, etc.) via nbclient. Each cell's output is captured and written to the output notebook file. The storage layer uses pluggable I/O handlers, allowing notebooks to be read from and written to cloud object stores. Error handling can be configured to raise exceptions on cell failures or continue execution.

Self-Hosting & Configuration

Install via pip with optional cloud storage extras: pip install papermill[s3,gcs,azure]
Tag a notebook cell as "parameters" using the Jupyter cell toolbar to mark injection point
Configure kernel name with -k flag to execute with non-default kernels
Set execution timeout per cell with --request-save-on-cell-execute for long-running jobs
Use environment variables or YAML files for parameter sets in batch execution

Key Features

CLI and Python API for flexible integration into scripts and pipelines
Cloud-native storage support for S3, GCS, Azure Blob, and HDFS
Works with any Jupyter kernel including Python, R, Julia, and Scala
Captures rich cell outputs (tables, charts, HTML) in the output notebook
Pairs with scrapbook library for extracting data and figures from executed notebooks

Comparison with Similar Tools

nbconvert — converts notebooks to HTML/PDF but does not parameterize or re-execute
Ploomber — notebook pipeline orchestrator with DAG support, broader scope
Dagstermill — Dagster integration for notebooks, uses Papermill under the hood
Jupyter Scheduler — JupyterLab extension for scheduled runs, less programmatic control
Marimo — reactive notebook format, different paradigm from traditional Jupyter notebooks

FAQ

Q: What notebook formats does Papermill support? A: Papermill works with standard .ipynb files (nbformat v4). Any notebook compatible with Jupyter is supported.

Q: Can I run Papermill in a CI/CD pipeline? A: Yes. Papermill is commonly used in CI/CD for automated report generation and notebook testing. It returns a non-zero exit code on cell execution failures.

Q: How do I pass complex parameters like lists or dicts? A: Use the -y flag with YAML strings or -f with a YAML parameter file for complex data types.

Q: Does Papermill support parallel execution? A: Papermill executes one notebook at a time. For parallel execution, use a workflow orchestrator like Airflow or run multiple Papermill processes.

Papermill — Parameterize and Execute Jupyter Notebooks

Introduction

What Papermill Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Fil de discussion

Actifs similaires

dnd-kit — Modern Drag and Drop Toolkit for React

Intro.js — Step-by-Step User Onboarding and Feature Tours

React Select — Flexible Select Input Control for React