Introduction
Apache Gravitino is a metadata management platform that unifies catalog operations across heterogeneous data sources and AI systems. Instead of managing separate metadata stores for each engine, Gravitino provides a single entry point for schema, table, model, and topic management.
What Apache Gravitino Does
- Provides a unified metadata catalog spanning relational databases, data lakes, and messaging systems
- Manages metadata for Hive, Iceberg, JDBC catalogs, Kafka topics, and ML model registries
- Enables cross-engine metadata sharing between Spark, Trino, Flink, and other query engines
- Supports multi-tenant metalakes with role-based access control
- Offers REST, Java, and Python APIs plus a web management UI
Architecture Overview
Gravitino introduces the concept of a metalake, a top-level namespace that groups catalogs from different data sources. Each catalog connects to a backend system (Hive Metastore, JDBC database, Iceberg REST catalog, Kafka cluster) via provider plugins. The Gravitino server exposes a REST API that translates unified metadata operations into backend-specific calls. An event listener framework enables audit logging and downstream notifications when metadata changes.
Self-Hosting & Configuration
- Download the release tarball or build from source with Gradle
- Configure gravitino-server.conf with the server port and backend storage settings
- Register catalogs via the REST API or web UI, specifying the provider and connection details
- Set up a relational backend (MySQL or PostgreSQL) for production metadata persistence
- Deploy behind a reverse proxy with TLS for production environments
Key Features
- Unified catalog interface for Hive, Iceberg, JDBC, Kafka, and model registries
- Metalake concept provides multi-tenant isolation for different teams or projects
- Cross-engine metadata sharing eliminates catalog duplication between Spark, Trino, and Flink
- Tag-based metadata classification and governance across all managed assets
- Event listener framework for audit trails and automated metadata workflows
Comparison with Similar Tools
- Hive Metastore — Hive-centric catalog; Gravitino unifies Hive with Iceberg, JDBC, Kafka, and more
- Unity Catalog — Databricks-originated; Gravitino is vendor-neutral and Apache-governed
- Apache Polaris — Iceberg-focused catalog; Gravitino covers a broader range of data and AI assets
- DataHub — metadata discovery and lineage; Gravitino is an operational catalog for query engines
- OpenMetadata — metadata platform; Gravitino serves as an active catalog that engines query directly
FAQ
Q: What is a metalake? A: A metalake is the top-level organizational unit in Gravitino. It groups multiple catalogs (Hive, Iceberg, JDBC, Kafka) under a single namespace for unified management.
Q: Which query engines can use Gravitino? A: Gravitino provides connectors for Apache Spark, Trino, and Apache Flink. Applications can also use the REST or Java/Python client APIs directly.
Q: Does Gravitino replace Hive Metastore? A: Gravitino can sit in front of Hive Metastore and other catalogs, providing a unified interface. It does not replace the backends but adds a unification layer.
Q: Is Gravitino production-ready? A: Apache Gravitino is an incubating project under the Apache Software Foundation with active development and growing production adoption.