ScriptsApr 16, 2026·3 min read

dbt — Data Build Tool for SQL Transformations

Open-source framework for modeling, testing, and documenting SQL transformations in the modern data warehouse.

TL;DR
dbt lets analytics engineers write modular SQL with tests, docs, and dependency management.
§01

What it is

dbt (data build tool) turns SQL into a software-engineering discipline for the analytics layer of your data warehouse. Analytics engineers write modular SELECT statements, and dbt compiles them into views, tables, and incremental models with dependency graphs, tests, documentation, and CI.

dbt supports Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, Spark, and many more data platforms through adapter plugins. It is used by thousands of data teams to manage their transformation layer.

§02

How it saves time or tokens

dbt eliminates the manual management of SQL dependencies and execution order. Instead of maintaining a list of which queries must run before others, dbt infers dependencies from ref() calls and builds a DAG automatically. Tests (not_null, unique, accepted_values, custom) run against every table after each build, catching data quality issues before they reach dashboards. The docs site is auto-generated from the project, eliminating separate documentation maintenance.

§03

How to use

  1. Install dbt with an adapter:
pip install dbt-postgres
  1. Scaffold a project and run:
dbt init analytics && cd analytics
# Configure ~/.dbt/profiles.yml with warehouse credentials
dbt deps          # Install packages
dbt seed          # Load reference CSVs
dbt run           # Materialize models
dbt test          # Run schema + data tests
dbt docs generate && dbt docs serve  # Browse lineage
  1. Write a model as a SELECT statement:
-- models/marts/dim_customers.sql
WITH customers AS (
    SELECT * FROM {{ ref('stg_customers') }}
),
orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
)
SELECT
    c.customer_id,
    c.name,
    c.email,
    COUNT(o.order_id) AS total_orders,
    SUM(o.amount) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3
§04

Example

Testing and documenting a dbt model:

# models/marts/schema.yml
version: 2
models:
  - name: dim_customers
    description: Customer dimension with lifetime value
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: email
        tests:
          - unique
          - not_null
      - name: lifetime_value
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
§05

Related on TokRepo

§06

Common pitfalls

  • Not using ref() for model dependencies breaks the DAG and causes stale data. Always reference other models with {{ ref('model_name') }}.
  • Running dbt run without dbt test means data quality issues go undetected. Always run tests after builds in your CI pipeline.
  • Writing all models as tables wastes warehouse storage. Use views for lightweight transformations and tables only for heavily queried models.

Frequently Asked Questions

What data platforms does dbt support?+

dbt supports Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, Spark, DuckDB, Trino, and more through adapter plugins. Each adapter handles dialect-specific SQL compilation.

What is the difference between dbt Core and dbt Cloud?+

dbt Core is the open-source CLI tool. dbt Cloud is a paid SaaS platform that adds a web IDE, job scheduling, CI/CD integration, and collaboration features on top of dbt Core.

How does dbt handle incremental models?+

Incremental models only process new or changed rows instead of rebuilding the entire table. dbt tracks a watermark column (like updated_at) and only transforms rows newer than the last run, saving compute costs.

Can dbt run tests on my data?+

Yes. dbt supports schema tests (unique, not_null, accepted_values, relationships) and custom data tests written as SQL queries. Tests run against materialized tables and fail the build if violations are found.

What is the dbt DAG?+

The DAG (Directed Acyclic Graph) is the dependency graph that dbt builds from ref() calls in your models. It determines the execution order so upstream models are built before downstream models that depend on them.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets