Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 13, 2026·3 min de lecture

LakeFS — Git-Like Version Control for Data Lakes

LakeFS adds Git-like branching, committing, and merging to your data lake on S3, GCS, or Azure Blob Storage, enabling reproducible data pipelines and zero-copy experimentation.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
LakeFS Data Versioning
Commande CLI universelle
npx tokrepo install 5b7a9740-4ecb-11f1-9bc6-00163e2b0d79

Introduction

LakeFS brings version control semantics to object storage. Data engineers can create branches, run experimental transformations in isolation, diff the results against production, and merge — all without copying data. It acts as a gateway that intercepts S3-compatible API calls and manages versioned metadata.

What LakeFS Does

  • Provides Git-like branching, committing, merging, and reverting for data stored in object storage
  • Exposes an S3-compatible API so existing tools (Spark, Trino, dbt, Airflow) work unchanged
  • Enables zero-copy branching — branches share underlying data until changes diverge
  • Tracks lineage and enables data diffing between any two references
  • Supports pre-merge and pre-commit hooks for data quality validation

Architecture Overview

LakeFS runs as a stateless Go service backed by PostgreSQL (for metadata) and your existing object store (S3, GCS, or Azure) for data. When a client writes via the S3 gateway, LakeFS records the object in a branch-specific namespace. Commits create immutable snapshots of the metadata tree. Merges perform a three-way diff on metadata pointers, not on data bytes, making them fast regardless of dataset size.

Self-Hosting & Configuration

  • Deploy via Docker, Kubernetes Helm chart, or native binaries
  • Requires PostgreSQL (or DynamoDB on AWS) for metadata storage
  • Configure the blockstore backend (S3, GCS, Azure, or local filesystem)
  • Set up authentication via built-in users, LDAP, or OIDC
  • Integrate with Airflow, Spark, or dbt using the S3-compatible endpoint with lakefs:// URIs

Key Features

  • Zero-copy branching — create branches instantly without duplicating data
  • S3-compatible gateway for transparent integration with any S3-aware tool
  • Pre-commit and pre-merge hooks for automated data validation
  • Web UI and CLI for browsing repositories, diffs, and commit history
  • Open source under the Apache 2.0 license with an active community

Comparison with Similar Tools

  • Delta Lake — table format with ACID transactions and time travel; LakeFS works at the object storage level across any file format
  • DVC — Git-based data versioning for ML experiments; LakeFS versions entire data lakes with branching semantics
  • Apache Iceberg — table format with snapshot isolation; LakeFS provides repository-level versioning independent of table format
  • Nessie — Git-like catalog for Iceberg tables; LakeFS is format-agnostic and operates at the storage layer

FAQ

Q: Does branching duplicate my data? A: No. LakeFS uses copy-on-write at the metadata level. Branches share the same underlying objects until changes are made.

Q: Can I use LakeFS with Spark? A: Yes. Point your Spark jobs at the LakeFS S3 gateway using lakefs:// URIs. No code changes needed beyond updating the endpoint.

Q: What happens if LakeFS goes down? A: Data in the object store remains accessible directly. LakeFS only manages metadata; it does not move or transform your data.

Q: Does it support garbage collection? A: Yes. A built-in GC process reclaims unreferenced objects from deleted branches or old commits.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires