ConfigsApr 16, 2026·3 min read

dlt — Data Load Tool for Python ELT Pipelines

dlt (data load tool) is an open-source Python library that simplifies building ELT pipelines. Define a source as a Python generator, pick a destination, and dlt handles schema inference, incremental loading, normalization, and state management automatically.

TL;DR
dlt simplifies ELT pipelines: define a Python generator source, pick a destination, and load.
§01

What it is

dlt (data load tool) is an open-source Python library that simplifies building ELT pipelines. You define a data source as a Python generator or function, choose a destination (BigQuery, Snowflake, DuckDB, PostgreSQL, and others), and dlt handles schema inference, incremental loading, data normalization, and state management automatically.

dlt targets data engineers and analysts who want to build production data pipelines in pure Python without learning a new framework or YAML DSL. It is lightweight, embeddable, and works in scripts, notebooks, and orchestrators alike.

§02

How it saves time or tokens

dlt eliminates the boilerplate of data loading: schema creation, type mapping, incremental state tracking, and nested JSON flattening. A pipeline that would take hundreds of lines with raw SQL and API calls becomes a few lines of Python. The automatic schema inference means you do not need to pre-define table schemas; dlt creates and evolves them based on the data it sees.

For AI workflows, dlt makes it easy to load API responses from LLM providers, vector databases, or analytics services into a data warehouse for analysis and reporting.

§03

How to use

  1. Install dlt with your destination: pip install dlt[bigquery] (or dlt[duckdb], dlt[snowflake], etc.).
  2. Define a source function that yields data. Use @dlt.resource to mark it as a loadable resource.
  3. Create a pipeline, connect it to your destination, and run it. dlt infers the schema, creates tables, and loads the data.
§04

Example

import dlt
import requests

@dlt.resource(write_disposition='merge', primary_key='id')
def github_issues():
    response = requests.get(
        'https://api.github.com/repos/dlt-hub/dlt/issues',
        params={'state': 'open', 'per_page': 100}
    )
    yield response.json()

pipeline = dlt.pipeline(
    pipeline_name='github_pipeline',
    destination='duckdb',
    dataset_name='github_data'
)

load_info = pipeline.run(github_issues)
print(load_info)

This pipeline fetches GitHub issues, creates a DuckDB table with inferred schema, and merges new data on subsequent runs using the id primary key.

§05

Related on TokRepo

§06

Common pitfalls

  • Schema inference works well for consistent data shapes. Highly variable JSON structures may produce wide tables with many nullable columns. Use dlt.resource hints to control column selection.
  • Incremental loading requires a cursor field (like updated_at). Without it, dlt loads all data on each run. Set incremental=dlt.sources.incremental('updated_at') to enable incremental behavior.
  • dlt pipelines run in-process by default. For large-scale production workloads, run them inside an orchestrator (Dagster, Airflow, Prefect) for scheduling, monitoring, and retry handling.

Frequently Asked Questions

What destinations does dlt support?+

dlt supports BigQuery, Snowflake, DuckDB, PostgreSQL, Redshift, Databricks, MotherDuck, Synapse, filesystem (Parquet/CSV), and others. Each destination is installed as a separate Python package and handles connection management, schema creation, and data type mapping.

How does dlt handle schema changes?+

dlt detects schema changes automatically. New columns are added to existing tables. Column type changes are handled according to configurable evolution policies. You can choose to discard new columns, evolve the schema, or raise an error on schema drift.

Can dlt do incremental loading?+

Yes. Use the incremental parameter on a resource to specify a cursor field (e.g., updated_at). dlt tracks the last loaded value and only fetches new records on subsequent runs. This works with both API sources and database extractions.

How does dlt compare to Airbyte?+

Airbyte is a platform with pre-built connectors and a web UI. dlt is a Python library where you write source logic in code. dlt is more flexible and lightweight but requires writing Python. Airbyte provides 300+ ready-made connectors with no coding needed.

Can I use dlt inside Jupyter notebooks?+

Yes. dlt is designed to work in notebooks. You can define sources, run pipelines, and inspect results interactively. The DuckDB destination is especially convenient for notebook workflows since it requires no external database setup.

Citations (3)
  • dlt GitHub— dlt is an open-source Python library for ELT pipelines
  • dlt Documentation— Automatic schema inference, incremental loading, and normalization
  • dlt Destinations— Supports BigQuery, Snowflake, DuckDB, and other destinations

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets