Configs2026年7月2日·1 分钟阅读

Apache Gravitino — Unified Metadata Lake for Data and AI

Apache Gravitino is a metadata lake that provides a single catalog interface to manage schemas, tables, models, and topics across multiple data sources, query engines, and AI platforms.

Agent 就绪

先审查再安装

这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项,确认后再继续。

Needs Confirmation · 64/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Apache Gravitino Overview
先审查命令
npx -y tokrepo@latest install 4b259937-75f1-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run,确认写入项后再运行此命令。

Introduction

Apache Gravitino is a metadata management platform that unifies catalog operations across heterogeneous data sources and AI systems. Instead of managing separate metadata stores for each engine, Gravitino provides a single entry point for schema, table, model, and topic management.

What Apache Gravitino Does

  • Provides a unified metadata catalog spanning relational databases, data lakes, and messaging systems
  • Manages metadata for Hive, Iceberg, JDBC catalogs, Kafka topics, and ML model registries
  • Enables cross-engine metadata sharing between Spark, Trino, Flink, and other query engines
  • Supports multi-tenant metalakes with role-based access control
  • Offers REST, Java, and Python APIs plus a web management UI

Architecture Overview

Gravitino introduces the concept of a metalake, a top-level namespace that groups catalogs from different data sources. Each catalog connects to a backend system (Hive Metastore, JDBC database, Iceberg REST catalog, Kafka cluster) via provider plugins. The Gravitino server exposes a REST API that translates unified metadata operations into backend-specific calls. An event listener framework enables audit logging and downstream notifications when metadata changes.

Self-Hosting & Configuration

  • Download the release tarball or build from source with Gradle
  • Configure gravitino-server.conf with the server port and backend storage settings
  • Register catalogs via the REST API or web UI, specifying the provider and connection details
  • Set up a relational backend (MySQL or PostgreSQL) for production metadata persistence
  • Deploy behind a reverse proxy with TLS for production environments

Key Features

  • Unified catalog interface for Hive, Iceberg, JDBC, Kafka, and model registries
  • Metalake concept provides multi-tenant isolation for different teams or projects
  • Cross-engine metadata sharing eliminates catalog duplication between Spark, Trino, and Flink
  • Tag-based metadata classification and governance across all managed assets
  • Event listener framework for audit trails and automated metadata workflows

Comparison with Similar Tools

  • Hive Metastore — Hive-centric catalog; Gravitino unifies Hive with Iceberg, JDBC, Kafka, and more
  • Unity Catalog — Databricks-originated; Gravitino is vendor-neutral and Apache-governed
  • Apache Polaris — Iceberg-focused catalog; Gravitino covers a broader range of data and AI assets
  • DataHub — metadata discovery and lineage; Gravitino is an operational catalog for query engines
  • OpenMetadata — metadata platform; Gravitino serves as an active catalog that engines query directly

FAQ

Q: What is a metalake? A: A metalake is the top-level organizational unit in Gravitino. It groups multiple catalogs (Hive, Iceberg, JDBC, Kafka) under a single namespace for unified management.

Q: Which query engines can use Gravitino? A: Gravitino provides connectors for Apache Spark, Trino, and Apache Flink. Applications can also use the REST or Java/Python client APIs directly.

Q: Does Gravitino replace Hive Metastore? A: Gravitino can sit in front of Hive Metastore and other catalogs, providing a unified interface. It does not replace the backends but adds a unification layer.

Q: Is Gravitino production-ready? A: Apache Gravitino is an incubating project under the Apache Software Foundation with active development and growing production adoption.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产