Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsJul 2, 2026·3 min de lectura

Apache Gravitino — Unified Metadata Lake for Data and AI

Apache Gravitino is a metadata lake that provides a single catalog interface to manage schemas, tables, models, and topics across multiple data sources, query engines, and AI platforms.

Listo para agents

Instalación con revisión previa

Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.

Needs Confirmation · 64/100Política: confirmar
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Apache Gravitino Overview
Comando con revisión previa
npx -y tokrepo@latest install 4b259937-75f1-11f1-9bc6-00163e2b0d79 --target codex

Primero dry-run, confirma las escrituras y luego ejecuta este comando.

Introduction

Apache Gravitino is a metadata management platform that unifies catalog operations across heterogeneous data sources and AI systems. Instead of managing separate metadata stores for each engine, Gravitino provides a single entry point for schema, table, model, and topic management.

What Apache Gravitino Does

  • Provides a unified metadata catalog spanning relational databases, data lakes, and messaging systems
  • Manages metadata for Hive, Iceberg, JDBC catalogs, Kafka topics, and ML model registries
  • Enables cross-engine metadata sharing between Spark, Trino, Flink, and other query engines
  • Supports multi-tenant metalakes with role-based access control
  • Offers REST, Java, and Python APIs plus a web management UI

Architecture Overview

Gravitino introduces the concept of a metalake, a top-level namespace that groups catalogs from different data sources. Each catalog connects to a backend system (Hive Metastore, JDBC database, Iceberg REST catalog, Kafka cluster) via provider plugins. The Gravitino server exposes a REST API that translates unified metadata operations into backend-specific calls. An event listener framework enables audit logging and downstream notifications when metadata changes.

Self-Hosting & Configuration

  • Download the release tarball or build from source with Gradle
  • Configure gravitino-server.conf with the server port and backend storage settings
  • Register catalogs via the REST API or web UI, specifying the provider and connection details
  • Set up a relational backend (MySQL or PostgreSQL) for production metadata persistence
  • Deploy behind a reverse proxy with TLS for production environments

Key Features

  • Unified catalog interface for Hive, Iceberg, JDBC, Kafka, and model registries
  • Metalake concept provides multi-tenant isolation for different teams or projects
  • Cross-engine metadata sharing eliminates catalog duplication between Spark, Trino, and Flink
  • Tag-based metadata classification and governance across all managed assets
  • Event listener framework for audit trails and automated metadata workflows

Comparison with Similar Tools

  • Hive Metastore — Hive-centric catalog; Gravitino unifies Hive with Iceberg, JDBC, Kafka, and more
  • Unity Catalog — Databricks-originated; Gravitino is vendor-neutral and Apache-governed
  • Apache Polaris — Iceberg-focused catalog; Gravitino covers a broader range of data and AI assets
  • DataHub — metadata discovery and lineage; Gravitino is an operational catalog for query engines
  • OpenMetadata — metadata platform; Gravitino serves as an active catalog that engines query directly

FAQ

Q: What is a metalake? A: A metalake is the top-level organizational unit in Gravitino. It groups multiple catalogs (Hive, Iceberg, JDBC, Kafka) under a single namespace for unified management.

Q: Which query engines can use Gravitino? A: Gravitino provides connectors for Apache Spark, Trino, and Apache Flink. Applications can also use the REST or Java/Python client APIs directly.

Q: Does Gravitino replace Hive Metastore? A: Gravitino can sit in front of Hive Metastore and other catalogs, providing a unified interface. It does not replace the backends but adds a unification layer.

Q: Is Gravitino production-ready? A: Apache Gravitino is an incubating project under the Apache Software Foundation with active development and growing production adoption.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados