SkillsApr 15, 2026·3 min read

Cube — Open Source Semantic Layer for Data Apps

Cube is a headless semantic layer that turns your warehouse into a reusable API for BI, embedded analytics, and AI — defining metrics once and serving them via SQL, REST, GraphQL, and MDX.

Script Depot · Community

Agent ready

Review-first install path

This asset needs a review step. The copied prompt tells the agent to dry-run, show the writes, then proceed only after confirmation.

Needs Confirmation · 64/100Policy: confirm

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Established

Entrypoint

Cube Guide

Review-first command

npx -y tokrepo@latest install 1a259f83-3908-11f1-9bc6-00163e2b0d79 --target codex

Dry-run first, confirm the writes, then run this command.

TL;DR

Cube defines metrics once in your warehouse and serves them via SQL, REST, GraphQL, and MDX to any consumer.

§01

What it is

Cube (formerly Cube.js) is an open-source semantic layer that sits between your data warehouse and your data consumers. You define metrics, dimensions, and joins in a data model, and Cube exposes them via SQL, REST, GraphQL, and MDX APIs. Any BI tool, embedded dashboard, or AI application queries Cube instead of writing raw SQL against the warehouse.

Cube is for data engineers, analytics engineers, and product teams who need consistent metrics across multiple consumers. Instead of duplicating metric logic in every dashboard and notebook, you define it once in Cube and serve it everywhere.

§02

How it saves time or tokens

Without a semantic layer, every consumer writes its own SQL. Two dashboards computing 'monthly active users' with slightly different WHERE clauses produce conflicting numbers. Cube eliminates this by centralizing metric definitions.

For AI applications, Cube's API is simpler than raw SQL. An LLM can query Orders.count through the Cube API rather than constructing complex JOIN statements. This reduces token usage and hallucination risk because the LLM operates on a curated, validated data model.

§03

How to use

Create a new Cube project:

npx cubejs-cli create my-cube -d postgres
cd my-cube

Define a data model in schema/Orders.js:

cube('Orders', {
  sql_table: 'public.orders',
  measures: {
    count: { type: 'count' },
    total_revenue: { type: 'sum', sql: 'amount' },
  },
  dimensions: {
    status: { sql: 'status', type: 'string' },
    created_at: { sql: 'created_at', type: 'time' },
  },
});

Start Cube and query via the API:

docker compose up -d
curl 'http://localhost:4000/cubejs-api/v1/load?query={"measures":["Orders.count"]}'

§04

Example

Querying Cube from a Python AI pipeline:

import requests

response = requests.get(
    'http://localhost:4000/cubejs-api/v1/load',
    params={'query': '{"measures":["Orders.total_revenue"],"timeDimensions":[{"dimension":"Orders.created_at","granularity":"month"}]}'}
)
data = response.json()['data']
for row in data:
    print(f"{row['Orders.created_at.month']}: ${row['Orders.total_revenue']}")

§05

Related on TokRepo

Database AI tools -- tools for database management and analytics
RAG tools -- retrieval-augmented generation with structured data

§06

Common pitfalls

Cube's pre-aggregation system caches query results for performance, but stale caches can serve outdated data. Configure refresh keys to match your data freshness requirements.
The data model syntax differs between Cube's JavaScript and YAML formats. Pick one format for consistency across your team.
Cube's free tier has limitations on concurrency and query volume. For production workloads with many concurrent users, evaluate Cube Cloud pricing.

Frequently Asked Questions

What data warehouses does Cube support?+

Cube supports PostgreSQL, MySQL, BigQuery, Snowflake, Redshift, ClickHouse, Databricks, and many other SQL databases. Each warehouse has a dedicated driver that handles dialect-specific SQL generation.

Can I use Cube with BI tools like Metabase or Superset?+

Yes. Cube exposes a SQL API that BI tools can connect to as if it were a database. Metabase, Superset, Tableau, and Power BI can all query Cube's semantic layer through standard SQL connections.

How does Cube handle caching?+

Cube uses pre-aggregations to cache query results in materialized tables. You define refresh schedules and partition strategies in the data model. For real-time data, Cube can bypass caches and query the source directly.

Is Cube suitable for embedded analytics?+

Yes. Cube provides REST and GraphQL APIs with JWT-based multi-tenant security. You can embed analytics in your product by querying Cube from your frontend and rendering results with any charting library.

How does Cube compare to dbt?+

dbt transforms data at rest in the warehouse (batch transformations). Cube serves data at query time through APIs. They are complementary: dbt builds your warehouse tables, Cube defines metrics on top of those tables and serves them to consumers.

Citations (3)

Cube GitHub— Cube semantic layer with SQL, REST, GraphQL, and MDX APIs
Cube Documentation— Pre-aggregation caching and data model definitions
Cube Blog— Semantic layer architecture for analytics

Related on TokRepo

Database AI tools RAG tools Featured workflows

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Kepler.gl — Open Source Geospatial Data Visualization

A powerful open-source tool for large-scale geospatial data visualization built on deck.gl and Mapbox GL.

Skills

AI Open Source

Redash — Open Source Data Visualization & Dashboard Tool

Redash connects to any data source, lets you query with SQL, visualize results, and build shareable dashboards. The SQL-first open-source BI tool for data teams.

Skills

AI Open Source

Cube — Open-Source Semantic Layer for AI and BI Analytics

Cube is an open-source semantic layer that sits between your data sources and downstream applications. It provides a unified API for defining metrics, managing access control, and caching query results across BI tools, AI agents, and embedded analytics.

Scripts

Script Depot

Grafana — Open Source Data Visualization & Observability

Grafana is the leading open-source platform for monitoring and observability. Visualize metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and 100+ data sources.

Skills

Grafana Labs