Apache Iceberg — Open Table Format for Huge Analytical Datasets

Introduction

Apache Iceberg is a high-performance table format designed for huge analytical datasets. It delivers ACID transactions, hidden partitioning, schema evolution, and time travel on top of Parquet/ORC/Avro files — without locking you into any specific query engine. Spark, Trino, Flink, Presto, Snowflake, BigQuery, DuckDB, Dremio, Athena, and others all read Iceberg natively.

What Iceberg Does

Stores massive tables as immutable data files tracked by metadata snapshots.
Enables atomic commits, concurrent writes, and isolation without locking the whole table.
Supports schema evolution (add/drop/rename columns) and partition evolution.
Provides time travel and rollback via snapshot-level reads.
Powers vendor-neutral lakehouses: the same table works across many engines.

Architecture Overview

An Iceberg table is a three-layer tree: a current metadata JSON pointer, a list of manifest lists per snapshot, and manifests that enumerate data files with per-column statistics. Engines prune partitions and files via these stats before reading any data. A catalog (REST, Nessie, Glue, Hive, Unity, JDBC, or the new Polaris) brokers atomic swaps of the metadata pointer to implement transactions.

Self-Hosting & Configuration

Pick a catalog: REST (iceberg-rest-fixture), Nessie (Git-like), Hive Metastore, Glue, Unity, or JDBC.
Store table data in S3/GCS/Azure/MinIO/HDFS with lifecycle policies that align with Iceberg retention.
Configure properties like write.format.default, write.target-file-size-bytes, and write.distribution-mode.
Use rewrite_data_files and expire_snapshots actions to compact and reclaim storage.
Integrate with Spark, Flink, Trino, Dremio, Starrocks, ClickHouse, and DuckDB via first-class connectors.

Key Features

Engine-agnostic: the same table is consumed by batch, streaming, and interactive engines.
Hidden partitioning — users query by event_time while Iceberg maps to partition columns.
Schema evolution that is metadata-only: no file rewrites.
Row-level updates via copy-on-write and merge-on-read deletion vectors.
Branches and tags (Git-style semantics) for experimentation, backfills, and GDPR deletes.

Comparison with Similar Tools

Delta Lake — ACID lakehouse format; tied tightly to Databricks ecosystem, now with UniForm interop.
Apache Hudi — Focused on upserts and incremental pulls; Iceberg targets broader analytics.
Hive tables — Directory-based; Iceberg replaces partition discovery with metadata-tracked files.
Parquet alone — Great columnar storage, no table semantics; Iceberg layers ACID on top.
Snowflake / BigQuery native — Closed; Iceberg keeps your data portable across engines.

FAQ

Q: Which engines can read Iceberg? A: Spark, Trino, Flink, Presto, Athena, Snowflake, BigQuery, DuckDB, Dremio, Starrocks, and many more.

Q: What catalog should I use? A: REST is emerging as the de-facto standard; Glue/Unity are common in cloud; Nessie adds Git-like branching.

Q: How do deletes work? A: Copy-on-write rewrites affected files, or merge-on-read writes delete files that readers merge in at query time.

Q: Can Iceberg handle streaming? A: Yes — Flink and Spark Structured Streaming can upsert into Iceberg tables with exactly-once commits.

Apache Iceberg — Open Table Format for Huge Analytical Datasets

Introduction

What Iceberg Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

OpenKruise — Advanced Kubernetes Workload Management

Cosign — Sign and Verify Container Images and Artifacts

External-DNS — Automatic DNS Records for Kubernetes Services