dbt — Data Build Tool for SQL Transformations
Open-source framework for modeling, testing, and documenting SQL transformations in the modern data warehouse.
What it is
dbt (data build tool) turns SQL into a software-engineering discipline for the analytics layer of your data warehouse. Analytics engineers write modular SELECT statements, and dbt compiles them into views, tables, and incremental models with dependency graphs, tests, documentation, and CI.
dbt supports Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, Spark, and many more data platforms through adapter plugins. It is used by thousands of data teams to manage their transformation layer.
How it saves time or tokens
dbt eliminates the manual management of SQL dependencies and execution order. Instead of maintaining a list of which queries must run before others, dbt infers dependencies from ref() calls and builds a DAG automatically. Tests (not_null, unique, accepted_values, custom) run against every table after each build, catching data quality issues before they reach dashboards. The docs site is auto-generated from the project, eliminating separate documentation maintenance.
How to use
- Install dbt with an adapter:
pip install dbt-postgres
- Scaffold a project and run:
dbt init analytics && cd analytics
# Configure ~/.dbt/profiles.yml with warehouse credentials
dbt deps # Install packages
dbt seed # Load reference CSVs
dbt run # Materialize models
dbt test # Run schema + data tests
dbt docs generate && dbt docs serve # Browse lineage
- Write a model as a SELECT statement:
-- models/marts/dim_customers.sql
WITH customers AS (
SELECT * FROM {{ ref('stg_customers') }}
),
orders AS (
SELECT * FROM {{ ref('stg_orders') }}
)
SELECT
c.customer_id,
c.name,
c.email,
COUNT(o.order_id) AS total_orders,
SUM(o.amount) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3
Example
Testing and documenting a dbt model:
# models/marts/schema.yml
version: 2
models:
- name: dim_customers
description: Customer dimension with lifetime value
columns:
- name: customer_id
tests:
- unique
- not_null
- name: email
tests:
- unique
- not_null
- name: lifetime_value
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
Related on TokRepo
- Database tools — More data tools on TokRepo.
- Automation tools — Browse data pipeline automation.
Common pitfalls
- Not using ref() for model dependencies breaks the DAG and causes stale data. Always reference other models with {{ ref('model_name') }}.
- Running dbt run without dbt test means data quality issues go undetected. Always run tests after builds in your CI pipeline.
- Writing all models as tables wastes warehouse storage. Use views for lightweight transformations and tables only for heavily queried models.
Frequently Asked Questions
dbt supports Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, Spark, DuckDB, Trino, and more through adapter plugins. Each adapter handles dialect-specific SQL compilation.
dbt Core is the open-source CLI tool. dbt Cloud is a paid SaaS platform that adds a web IDE, job scheduling, CI/CD integration, and collaboration features on top of dbt Core.
Incremental models only process new or changed rows instead of rebuilding the entire table. dbt tracks a watermark column (like updated_at) and only transforms rows newer than the last run, saving compute costs.
Yes. dbt supports schema tests (unique, not_null, accepted_values, relationships) and custom data tests written as SQL queries. Tests run against materialized tables and fail the build if violations are found.
The DAG (Directed Acyclic Graph) is the dependency graph that dbt builds from ref() calls in your models. It determines the execution order so upstream models are built before downstream models that depend on them.
Citations (3)
- dbt-core GitHub— dbt is an open-source data build tool
- dbt Documentation— dbt documentation and best practices
- dbt Learn— Analytics engineering with dbt
Related on TokRepo
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.