Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsMay 21, 2026·3 min de lecture

Amundsen — Open-Source Data Discovery and Metadata Platform

A data discovery and metadata engine by LF AI & Data that helps data teams find, understand, and trust their data assets.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Amundsen Overview
Commande CLI universelle
npx tokrepo install bb17c50e-5551-11f1-9bc6-00163e2b0d79

Introduction

Amundsen is a data discovery and metadata platform originally built at Lyft and now maintained under LF AI & Data Foundation. It helps data engineers, analysts, and scientists find the right datasets by providing a search interface, data lineage, ownership tracking, and usage statistics across an organization's data warehouse and lake.

What Amundsen Does

  • Indexes metadata from databases, warehouses, dashboards, and feature stores into a searchable catalog
  • Ranks search results by usage popularity and relevance signals
  • Tracks table and column-level lineage across data pipelines
  • Displays data owners, descriptions, tags, and freshness badges
  • Integrates with Airflow, dbt, Spark, and other tools to ingest metadata automatically

Architecture Overview

Amundsen consists of three microservices: a frontend service (Flask), a search service backed by Elasticsearch, and a metadata service backed by a graph database (Neo4j or Apache Atlas). Databuilder is a separate ETL framework that extracts metadata from source systems and loads it into the metadata and search stores. The frontend communicates with the backend services via REST APIs.

Self-Hosting & Configuration

  • Deploy with Docker Compose for quick evaluation or Helm charts for Kubernetes production setups
  • Configure Databuilder extractors to connect to your Hive, PostgreSQL, BigQuery, Snowflake, or Redshift sources
  • Choose Neo4j or Apache Atlas as the metadata graph backend depending on your infrastructure
  • Set up Airflow DAGs to run Databuilder jobs on a schedule for continuous metadata ingestion
  • Customize the frontend with environment variables for branding, authentication, and feature flags

Key Features

  • Popularity-based search ranking surfaces the most-used tables first
  • Column-level descriptions and tags help analysts understand schema semantics
  • Data preview shows sample rows without leaving the catalog UI
  • Programmatic descriptions allow dbt or Airflow to push documentation automatically
  • Badge system highlights certified, deprecated, or PII-containing datasets

Comparison with Similar Tools

  • DataHub — DataHub is a more recent metadata platform with a richer UI; Amundsen is lighter and simpler to deploy
  • Apache Atlas — Atlas focuses on governance and lineage for Hadoop; Amundsen adds a discovery-first search experience
  • OpenMetadata — OpenMetadata is a newer all-in-one platform; Amundsen has a longer production track record at Lyft-scale
  • Datahub by LinkedIn — LinkedIn DataHub offers fine-grained access control; Amundsen focuses on search and discovery
  • Marquez — Marquez is a lineage-focused metadata service; Amundsen provides a full search and catalog UI

FAQ

Q: What databases can Amundsen index? A: Amundsen supports Hive, PostgreSQL, MySQL, Redshift, BigQuery, Snowflake, Presto, Delta Lake, and many others through Databuilder extractors.

Q: Does Amundsen support data lineage? A: Yes. Amundsen displays table-level and column-level lineage when the metadata is ingested from tools like Airflow or dbt.

Q: Can I add custom metadata to tables? A: Yes. You can add tags, descriptions, owners, and badges both through the UI and programmatically via the metadata API.

Q: How does Amundsen handle authentication? A: Amundsen supports OIDC-based authentication and can integrate with your existing SSO provider.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires