What is Delta Lake — Open Storage Format for the Lakehouse?

ACID transactions, time travel, and schema evolution for your data lake on top of Parquet and object storage.

Is Delta Lake — Open Storage Format for the Lakehouse free to use?

Yes. Delta Lake — Open Storage Format for the Lakehouse is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Delta Lake — Open Storage Format for the Lakehouse?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Delta Lake — Open Storage Format for the Lakehouse

Introduction

Delta Lake is an open-source storage layer that brings ACID transactions, scalable metadata, and time travel to Parquet files sitting on cloud object storage. It is the foundation of the Databricks lakehouse architecture, and also runs on Spark, Trino, Presto, Flink, Hive, and a growing set of pure-Rust/Python clients.

What Delta Lake Does

Adds ACID transactions to Parquet tables via a JSON + checkpointed transaction log.
Supports MERGE/UPDATE/DELETE, time travel, and schema evolution.
Handles concurrent writers with optimistic concurrency control.
Integrates with Spark SQL, Structured Streaming, Flink, Trino, Athena, and more.
Provides Z-order, Liquid Clustering, and data skipping for fast analytic queries.

Architecture Overview

A Delta table is a directory of Parquet data files plus a _delta_log that is an ordered sequence of JSON commits (*.json) with periodic Parquet checkpoints. Each commit records added/removed files and metadata changes. Readers reconstruct the latest snapshot from the log; writers append commits using optimistic concurrency and file-level conflict detection.

Self-Hosting & Configuration

Runtime options: Spark (io.delta:delta-spark), Flink, Trino (delta connector), Presto, Hive, Python (deltalake), Rust.
Use Databricks, EMR, Dataproc, or self-managed Spark/Flink with S3/GCS/Azure Blob/MinIO.
Enable Unity Catalog or Hive Metastore/Glue to register Delta tables for multi-engine access.
Tune retention with delta.deletedFileRetentionDuration and logRetentionDuration.
Optimize layout with OPTIMIZE + ZORDER BY (or Liquid Clustering) on hot query keys.

Key Features

ACID transactions on S3/GCS/Azure Blob/HDFS — no extra database required.
Time travel with VERSION AS OF / TIMESTAMP AS OF for audits and rollbacks.
Schema evolution and enforcement via mergeSchema and constraints.
Change Data Feed (CDF) lets downstream consumers read row-level changes.
UniForm: expose a Delta table as Iceberg or Hudi metadata for cross-engine reads.

Comparison with Similar Tools

Apache Iceberg — Similar ACID lakehouse format with stronger multi-engine catalog story.
Apache Hudi — Optimized for upserts and incremental pulls; Delta focuses on simple + fast analytics.
Hive ACID — Older, metastore-heavy; Delta is log-based, cloud-native, and vendor-neutral.
Parquet alone — No ACID or time travel; Delta adds them without rewriting data.
BigLake / Snowflake Iceberg — Managed lakehouse catalogs; Delta is OSS and engine-agnostic.

FAQ

Q: Does Delta Lake require Spark? A: No — delta-rs provides a pure Rust library with Python bindings, and Trino/Flink connectors exist too.

Q: How does concurrency work? A: Writers use optimistic concurrency: commit is a conditional put on the next log file; conflicts are re-checked against file-level overlaps.

Q: Can I query Delta from Athena or BigQuery? A: Yes — Athena supports Delta reads natively, and BigLake/Trino/Presto provide connectors.

Q: What is UniForm? A: A feature that writes Iceberg (and Hudi) metadata alongside Delta, enabling multi-format readers on the same data files.

Delta Lake — Open Storage Format for the Lakehouse

Introduction

What Delta Lake Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

CUE — Validate, Define, and Generate Configuration with Types

Prometheus Operator — Kubernetes-Native Monitoring Stack Management

Ory Kratos — Cloud-Native Identity and User Management