Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsJul 2, 2026·3 min de lecture

Apache Gravitino — Unified Metadata Lake for Data and AI

Apache Gravitino is a metadata lake that provides a single catalog interface to manage schemas, tables, models, and topics across multiple data sources, query engines, and AI platforms.

Prêt pour agents

Installation avec revue préalable

Cet actif nécessite une revue. Le prompt copié demande un dry-run, affiche les écritures, puis continue seulement après confirmation.

Needs Confirmation · 64/100Policy : confirmer
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Apache Gravitino Overview
Commande avec revue préalable
npx -y tokrepo@latest install 4b259937-75f1-11f1-9bc6-00163e2b0d79 --target codex

Dry-run d'abord, confirmez les écritures, puis lancez cette commande.

Introduction

Apache Gravitino is a metadata management platform that unifies catalog operations across heterogeneous data sources and AI systems. Instead of managing separate metadata stores for each engine, Gravitino provides a single entry point for schema, table, model, and topic management.

What Apache Gravitino Does

  • Provides a unified metadata catalog spanning relational databases, data lakes, and messaging systems
  • Manages metadata for Hive, Iceberg, JDBC catalogs, Kafka topics, and ML model registries
  • Enables cross-engine metadata sharing between Spark, Trino, Flink, and other query engines
  • Supports multi-tenant metalakes with role-based access control
  • Offers REST, Java, and Python APIs plus a web management UI

Architecture Overview

Gravitino introduces the concept of a metalake, a top-level namespace that groups catalogs from different data sources. Each catalog connects to a backend system (Hive Metastore, JDBC database, Iceberg REST catalog, Kafka cluster) via provider plugins. The Gravitino server exposes a REST API that translates unified metadata operations into backend-specific calls. An event listener framework enables audit logging and downstream notifications when metadata changes.

Self-Hosting & Configuration

  • Download the release tarball or build from source with Gradle
  • Configure gravitino-server.conf with the server port and backend storage settings
  • Register catalogs via the REST API or web UI, specifying the provider and connection details
  • Set up a relational backend (MySQL or PostgreSQL) for production metadata persistence
  • Deploy behind a reverse proxy with TLS for production environments

Key Features

  • Unified catalog interface for Hive, Iceberg, JDBC, Kafka, and model registries
  • Metalake concept provides multi-tenant isolation for different teams or projects
  • Cross-engine metadata sharing eliminates catalog duplication between Spark, Trino, and Flink
  • Tag-based metadata classification and governance across all managed assets
  • Event listener framework for audit trails and automated metadata workflows

Comparison with Similar Tools

  • Hive Metastore — Hive-centric catalog; Gravitino unifies Hive with Iceberg, JDBC, Kafka, and more
  • Unity Catalog — Databricks-originated; Gravitino is vendor-neutral and Apache-governed
  • Apache Polaris — Iceberg-focused catalog; Gravitino covers a broader range of data and AI assets
  • DataHub — metadata discovery and lineage; Gravitino is an operational catalog for query engines
  • OpenMetadata — metadata platform; Gravitino serves as an active catalog that engines query directly

FAQ

Q: What is a metalake? A: A metalake is the top-level organizational unit in Gravitino. It groups multiple catalogs (Hive, Iceberg, JDBC, Kafka) under a single namespace for unified management.

Q: Which query engines can use Gravitino? A: Gravitino provides connectors for Apache Spark, Trino, and Apache Flink. Applications can also use the REST or Java/Python client APIs directly.

Q: Does Gravitino replace Hive Metastore? A: Gravitino can sit in front of Hive Metastore and other catalogs, providing a unified interface. It does not replace the backends but adds a unification layer.

Q: Is Gravitino production-ready? A: Apache Gravitino is an incubating project under the Apache Software Foundation with active development and growing production adoption.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires