Introduction
Apache Iceberg is a high-performance table format designed for huge analytical datasets. It delivers ACID transactions, hidden partitioning, schema evolution, and time travel on top of Parquet/ORC/Avro files — without locking you into any specific query engine. Spark, Trino, Flink, Presto, Snowflake, BigQuery, DuckDB, Dremio, Athena, and others all read Iceberg natively.
What Iceberg Does
- Stores massive tables as immutable data files tracked by metadata snapshots.
- Enables atomic commits, concurrent writes, and isolation without locking the whole table.
- Supports schema evolution (add/drop/rename columns) and partition evolution.
- Provides time travel and rollback via snapshot-level reads.
- Powers vendor-neutral lakehouses: the same table works across many engines.
Architecture Overview
An Iceberg table is a three-layer tree: a current metadata JSON pointer, a list of manifest lists per snapshot, and manifests that enumerate data files with per-column statistics. Engines prune partitions and files via these stats before reading any data. A catalog (REST, Nessie, Glue, Hive, Unity, JDBC, or the new Polaris) brokers atomic swaps of the metadata pointer to implement transactions.
Self-Hosting & Configuration
- Pick a catalog: REST (
iceberg-rest-fixture), Nessie (Git-like), Hive Metastore, Glue, Unity, or JDBC. - Store table data in S3/GCS/Azure/MinIO/HDFS with lifecycle policies that align with Iceberg retention.
- Configure properties like
write.format.default,write.target-file-size-bytes, andwrite.distribution-mode. - Use
rewrite_data_filesandexpire_snapshotsactions to compact and reclaim storage. - Integrate with Spark, Flink, Trino, Dremio, Starrocks, ClickHouse, and DuckDB via first-class connectors.
Key Features
- Engine-agnostic: the same table is consumed by batch, streaming, and interactive engines.
- Hidden partitioning — users query by
event_timewhile Iceberg maps to partition columns. - Schema evolution that is metadata-only: no file rewrites.
- Row-level updates via copy-on-write and merge-on-read deletion vectors.
- Branches and tags (Git-style semantics) for experimentation, backfills, and GDPR deletes.
Comparison with Similar Tools
- Delta Lake — ACID lakehouse format; tied tightly to Databricks ecosystem, now with UniForm interop.
- Apache Hudi — Focused on upserts and incremental pulls; Iceberg targets broader analytics.
- Hive tables — Directory-based; Iceberg replaces partition discovery with metadata-tracked files.
- Parquet alone — Great columnar storage, no table semantics; Iceberg layers ACID on top.
- Snowflake / BigQuery native — Closed; Iceberg keeps your data portable across engines.
FAQ
Q: Which engines can read Iceberg? A: Spark, Trino, Flink, Presto, Athena, Snowflake, BigQuery, DuckDB, Dremio, Starrocks, and many more.
Q: What catalog should I use? A: REST is emerging as the de-facto standard; Glue/Unity are common in cloud; Nessie adds Git-like branching.
Q: How do deletes work? A: Copy-on-write rewrites affected files, or merge-on-read writes delete files that readers merge in at query time.
Q: Can Iceberg handle streaming? A: Yes — Flink and Spark Structured Streaming can upsert into Iceberg tables with exactly-once commits.