Scripts2026年4月15日·1 分钟阅读

Cube — Open Source Semantic Layer for Data Apps

Cube is a headless semantic layer that turns your warehouse into a reusable API for BI, embedded analytics, and AI — defining metrics once and serving them via SQL, REST, GraphQL, and MDX.

Introduction

Cube (formerly Cube.js) is an open source semantic layer for building data applications. You define measures, dimensions, and joins once in YAML/JS/Python; Cube generates and caches SQL against your warehouse and serves the results to BI tools, apps, notebooks, and LLM agents via SQL, REST, GraphQL, or MDX. The project has over 19,000 GitHub stars and is used by thousands of teams to keep metric definitions consistent.

What Cube Does

  • Models data with reusable cubes, views, and pre-aggregations defined in code and version-controlled.
  • Translates client queries into optimized warehouse SQL with automatic pre-aggregation routing.
  • Serves the same model through SQL (Postgres-compatible), REST, GraphQL, and MDX endpoints.
  • Handles multi-tenant row-level security with context-aware filters.
  • Provides a playground, developer API, and TypeScript client for embedded analytics.

Architecture Overview

Cube is a Node.js + Rust stack. The schema compiler turns your data model into an intermediate representation; the query orchestrator matches requests to cubes, rewrites them into warehouse SQL, and hits Cube Store (a distributed Arrow-based cache) or the source warehouse directly. A SQL API layer built on DataFusion speaks the Postgres wire protocol so Tableau/Looker/Power BI can query Cube as if it were a database. Authentication is JWT-based with per-user filter injection. Supported warehouses include Snowflake, BigQuery, Redshift, Databricks, ClickHouse, Postgres, MySQL, Trino, DuckDB, Pinot, and more.

Self-Hosting & Configuration

  • Run with cubejs-cli + Node, or deploy the official Docker image (cubejs/cube) on Kubernetes.
  • For production, separate cube-api and cube-refresh-worker processes and back them with Redis + Cube Store.
  • Define schemas in YAML, JS, or Python — Python model support makes cubes editable from Jupyter.
  • Enable pre-aggregations on hot metrics — Cube materializes them on a schedule into Cube Store (Arrow-backed, S3-compatible).
  • Protect the API with signed JWTs and securityContext so every query is scoped to a tenant.

Key Features

  • Define metrics once, consume everywhere — the canonical semantic layer for modern data stacks.
  • Pre-aggregations give sub-second p95 even on multi-billion-row warehouses.
  • Postgres-compatible SQL API plugs into Tableau, Looker, Power BI, Metabase, and Superset with no adapters.
  • First-class support for LLM text-to-metric via the SQL API and Cube Cloud's AI Assistant.
  • Self-hosted Apache-2.0 core with identical APIs to the managed Cube Cloud.

Comparison with Similar Tools

  • dbt Semantic Layer / MetricFlow — defines metrics alongside dbt models; query interface is less flexible than Cube.
  • LookML (Looker) — proprietary and tied to Looker UI; Cube is open and API-first.
  • MetricFlow (stand-alone) — now part of dbt; similar goals, fewer integrations.
  • Malloy (Google) — experimental modeling language; not a production semantic layer yet.
  • AtScale — commercial semantic layer; Cube gives 80% of features for free.

FAQ

Q: Do I need Cube Cloud? A: No — the open source version is production-grade. Cube Cloud adds managed hosting, SSO, and an AI assistant.

Q: How does Cube compare to just writing views in my warehouse? A: Views lack pre-aggregation routing, caching, multi-protocol APIs, and tenant-scoped security — all of which Cube provides.

Q: Can Cube cache queries? A: Yes, via Cube Store pre-aggregations and an in-memory query cache with TTL and on-demand refresh.

Q: Can I query Cube from an LLM agent? A: Yes — use the SQL API or the GraphQL API; Cube also ships an MCP server for agent tool use.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产