ConfigsApr 15, 2026·3 min read

Citus — Distributed PostgreSQL for Sharding and HTAP

A Postgres extension that turns your database into a distributed cluster with sharding, columnar storage and parallel query — keeping full SQL, ACID, JSONB, PostGIS and the Postgres ecosystem intact.

TL;DR
Citus turns PostgreSQL into a distributed cluster with sharding, columnar storage, and parallel queries while keeping full SQL.
§01

What it is

Citus is a PostgreSQL extension that transforms a single-node Postgres database into a distributed cluster. It adds horizontal sharding, columnar storage, and parallel query execution while preserving full SQL compatibility, ACID transactions, JSONB, PostGIS, and the entire Postgres extension ecosystem.

Citus is designed for multi-tenant SaaS applications and real-time analytics workloads (HTAP). It distributes tables across worker nodes by a chosen distribution column (typically tenant_id), allowing queries to be parallelized across shards. Now part of Microsoft Azure (Azure Cosmos DB for PostgreSQL), Citus remains fully open-source.

§02

How it saves time or tokens

Citus eliminates the need to migrate away from PostgreSQL when your data outgrows a single node. Instead of rewriting your application for a different database, you add the Citus extension and distribute your tables. Existing SQL queries, indexes, and Postgres features continue to work. Columnar storage compresses analytical data by up to 10x, reducing storage costs for time-series and log data.

§03

How to use

  1. Spin up a Citus cluster: docker compose -p citus up -d --scale worker=2 using the official Docker setup.
  2. Connect to the coordinator node: psql -U postgres.
  3. Create tables and distribute them: SELECT create_distributed_table('events', 'tenant_id');.
  4. Query normally -- Citus parallelizes execution across workers automatically.
§04

Example

-- Create and distribute a table
CREATE TABLE events (
  tenant_id BIGINT,
  id BIGSERIAL,
  payload JSONB,
  ts TIMESTAMPTZ DEFAULT now()
);
SELECT create_distributed_table('events', 'tenant_id');

-- Columnar storage for analytics
CREATE TABLE logs (ts TIMESTAMPTZ, level TEXT, message TEXT)
  USING columnar;

-- Query across shards
SELECT tenant_id, COUNT(*), AVG(payload->>'duration')
FROM events
WHERE ts > now() - INTERVAL '1 day'
GROUP BY tenant_id;
§05

Related on TokRepo

§06

Common pitfalls

  • Choosing the wrong distribution column leads to data skew and hot shards. Pick a column with high cardinality and even distribution (tenant_id is the classic choice).
  • Cross-shard joins are expensive. Design your schema so that co-located tables share the same distribution column to keep joins local.
  • Not all Postgres features work identically in distributed mode. Check Citus documentation for limitations around CTEs, window functions, and certain DDL operations.

Frequently Asked Questions

What is Citus used for?+

Citus is used for scaling PostgreSQL horizontally. Primary use cases are multi-tenant SaaS applications (isolate tenant data across shards) and real-time analytics (parallel aggregation across distributed data). It handles both OLTP and OLAP workloads.

Is Citus free and open source?+

Yes. Citus is open-source under the AGPL license. It is also available as a managed service through Azure Cosmos DB for PostgreSQL. The open-source version includes all sharding, columnar storage, and parallel query features.

How does Citus compare to native PostgreSQL partitioning?+

PostgreSQL partitioning splits data within a single node. Citus distributes data across multiple nodes with a coordinator that routes queries. Partitioning helps with pruning; Citus helps with horizontal scaling beyond what one machine can handle.

Can I use PostGIS with Citus?+

Yes. Citus is a Postgres extension, so it works alongside other extensions including PostGIS, pg_trgm, and hstore. Geospatial queries on distributed tables work as expected when the distribution column is included in the query.

What is columnar storage in Citus?+

Columnar storage stores data by column rather than by row, which improves compression ratios and scan performance for analytical queries. In Citus, create a table with `USING columnar` to enable it. Best for append-only data like logs and time-series.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets