ConfigsJul 2, 2026·3 min read

Apache Gravitino — Unified Metadata Lake for Data and AI

Apache Gravitino is a metadata lake that provides a single catalog interface to manage schemas, tables, models, and topics across multiple data sources, query engines, and AI platforms.

Agent ready

Review-first install path

This asset needs a review step. The copied prompt tells the agent to dry-run, show the writes, then proceed only after confirmation.

Needs Confirmation · 64/100Policy: confirm
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Apache Gravitino Overview
Review-first command
npx -y tokrepo@latest install 4b259937-75f1-11f1-9bc6-00163e2b0d79 --target codex

Dry-run first, confirm the writes, then run this command.

Introduction

Apache Gravitino is a metadata management platform that unifies catalog operations across heterogeneous data sources and AI systems. Instead of managing separate metadata stores for each engine, Gravitino provides a single entry point for schema, table, model, and topic management.

What Apache Gravitino Does

  • Provides a unified metadata catalog spanning relational databases, data lakes, and messaging systems
  • Manages metadata for Hive, Iceberg, JDBC catalogs, Kafka topics, and ML model registries
  • Enables cross-engine metadata sharing between Spark, Trino, Flink, and other query engines
  • Supports multi-tenant metalakes with role-based access control
  • Offers REST, Java, and Python APIs plus a web management UI

Architecture Overview

Gravitino introduces the concept of a metalake, a top-level namespace that groups catalogs from different data sources. Each catalog connects to a backend system (Hive Metastore, JDBC database, Iceberg REST catalog, Kafka cluster) via provider plugins. The Gravitino server exposes a REST API that translates unified metadata operations into backend-specific calls. An event listener framework enables audit logging and downstream notifications when metadata changes.

Self-Hosting & Configuration

  • Download the release tarball or build from source with Gradle
  • Configure gravitino-server.conf with the server port and backend storage settings
  • Register catalogs via the REST API or web UI, specifying the provider and connection details
  • Set up a relational backend (MySQL or PostgreSQL) for production metadata persistence
  • Deploy behind a reverse proxy with TLS for production environments

Key Features

  • Unified catalog interface for Hive, Iceberg, JDBC, Kafka, and model registries
  • Metalake concept provides multi-tenant isolation for different teams or projects
  • Cross-engine metadata sharing eliminates catalog duplication between Spark, Trino, and Flink
  • Tag-based metadata classification and governance across all managed assets
  • Event listener framework for audit trails and automated metadata workflows

Comparison with Similar Tools

  • Hive Metastore — Hive-centric catalog; Gravitino unifies Hive with Iceberg, JDBC, Kafka, and more
  • Unity Catalog — Databricks-originated; Gravitino is vendor-neutral and Apache-governed
  • Apache Polaris — Iceberg-focused catalog; Gravitino covers a broader range of data and AI assets
  • DataHub — metadata discovery and lineage; Gravitino is an operational catalog for query engines
  • OpenMetadata — metadata platform; Gravitino serves as an active catalog that engines query directly

FAQ

Q: What is a metalake? A: A metalake is the top-level organizational unit in Gravitino. It groups multiple catalogs (Hive, Iceberg, JDBC, Kafka) under a single namespace for unified management.

Q: Which query engines can use Gravitino? A: Gravitino provides connectors for Apache Spark, Trino, and Apache Flink. Applications can also use the REST or Java/Python client APIs directly.

Q: Does Gravitino replace Hive Metastore? A: Gravitino can sit in front of Hive Metastore and other catalogs, providing a unified interface. It does not replace the backends but adds a unification layer.

Q: Is Gravitino production-ready? A: Apache Gravitino is an incubating project under the Apache Software Foundation with active development and growing production adoption.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets