Cube — Open Source Semantic Layer for Data Apps
Cube is a headless semantic layer that turns your warehouse into a reusable API for BI, embedded analytics, and AI — defining metrics once and serving them via SQL, REST, GraphQL, and MDX.
What it is
Cube (formerly Cube.js) is an open-source semantic layer that sits between your data warehouse and your data consumers. You define metrics, dimensions, and joins in a data model, and Cube exposes them via SQL, REST, GraphQL, and MDX APIs. Any BI tool, embedded dashboard, or AI application queries Cube instead of writing raw SQL against the warehouse.
Cube is for data engineers, analytics engineers, and product teams who need consistent metrics across multiple consumers. Instead of duplicating metric logic in every dashboard and notebook, you define it once in Cube and serve it everywhere.
How it saves time or tokens
Without a semantic layer, every consumer writes its own SQL. Two dashboards computing 'monthly active users' with slightly different WHERE clauses produce conflicting numbers. Cube eliminates this by centralizing metric definitions.
For AI applications, Cube's API is simpler than raw SQL. An LLM can query Orders.count through the Cube API rather than constructing complex JOIN statements. This reduces token usage and hallucination risk because the LLM operates on a curated, validated data model.
How to use
- Create a new Cube project:
npx cubejs-cli create my-cube -d postgres
cd my-cube
- Define a data model in
schema/Orders.js:
cube('Orders', {
sql_table: 'public.orders',
measures: {
count: { type: 'count' },
total_revenue: { type: 'sum', sql: 'amount' },
},
dimensions: {
status: { sql: 'status', type: 'string' },
created_at: { sql: 'created_at', type: 'time' },
},
});
- Start Cube and query via the API:
docker compose up -d
curl 'http://localhost:4000/cubejs-api/v1/load?query={"measures":["Orders.count"]}'
Example
Querying Cube from a Python AI pipeline:
import requests
response = requests.get(
'http://localhost:4000/cubejs-api/v1/load',
params={'query': '{"measures":["Orders.total_revenue"],"timeDimensions":[{"dimension":"Orders.created_at","granularity":"month"}]}'}
)
data = response.json()['data']
for row in data:
print(f"{row['Orders.created_at.month']}: ${row['Orders.total_revenue']}")
Related on TokRepo
- Database AI tools -- tools for database management and analytics
- RAG tools -- retrieval-augmented generation with structured data
Common pitfalls
- Cube's pre-aggregation system caches query results for performance, but stale caches can serve outdated data. Configure refresh keys to match your data freshness requirements.
- The data model syntax differs between Cube's JavaScript and YAML formats. Pick one format for consistency across your team.
- Cube's free tier has limitations on concurrency and query volume. For production workloads with many concurrent users, evaluate Cube Cloud pricing.
Frequently Asked Questions
Cube supports PostgreSQL, MySQL, BigQuery, Snowflake, Redshift, ClickHouse, Databricks, and many other SQL databases. Each warehouse has a dedicated driver that handles dialect-specific SQL generation.
Yes. Cube exposes a SQL API that BI tools can connect to as if it were a database. Metabase, Superset, Tableau, and Power BI can all query Cube's semantic layer through standard SQL connections.
Cube uses pre-aggregations to cache query results in materialized tables. You define refresh schedules and partition strategies in the data model. For real-time data, Cube can bypass caches and query the source directly.
Yes. Cube provides REST and GraphQL APIs with JWT-based multi-tenant security. You can embed analytics in your product by querying Cube from your frontend and rendering results with any charting library.
dbt transforms data at rest in the warehouse (batch transformations). Cube serves data at query time through APIs. They are complementary: dbt builds your warehouse tables, Cube defines metrics on top of those tables and serves them to consumers.
Citations (3)
- Cube GitHub— Cube semantic layer with SQL, REST, GraphQL, and MDX APIs
- Cube Documentation— Pre-aggregation caching and data model definitions
- Cube Blog— Semantic layer architecture for analytics
Related on TokRepo
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.