Citus — Distributed PostgreSQL for Sharding and HTAP
A Postgres extension that turns your database into a distributed cluster with sharding, columnar storage and parallel query — keeping full SQL, ACID, JSONB, PostGIS and the Postgres ecosystem intact.
Instalación lista para agent
Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.
npx -y tokrepo@latest install cb669573-3920-11f1-9bc6-00163e2b0d79 --target codexEjecutar después de confirmar el plan con dry-run.
What it is
Citus is a PostgreSQL extension that transforms a single-node Postgres database into a distributed cluster. It adds horizontal sharding, columnar storage, and parallel query execution while preserving full SQL compatibility, ACID transactions, JSONB, PostGIS, and the entire Postgres extension ecosystem.
Citus is designed for multi-tenant SaaS applications and real-time analytics workloads (HTAP). It distributes tables across worker nodes by a chosen distribution column (typically tenant_id), allowing queries to be parallelized across shards. Now part of Microsoft Azure (Azure Cosmos DB for PostgreSQL), Citus remains fully open-source.
How it saves time or tokens
Citus eliminates the need to migrate away from PostgreSQL when your data outgrows a single node. Instead of rewriting your application for a different database, you add the Citus extension and distribute your tables. Existing SQL queries, indexes, and Postgres features continue to work. Columnar storage compresses analytical data by up to 10x, reducing storage costs for time-series and log data.
How to use
- Spin up a Citus cluster:
docker compose -p citus up -d --scale worker=2using the official Docker setup. - Connect to the coordinator node:
psql -U postgres. - Create tables and distribute them:
SELECT create_distributed_table('events', 'tenant_id');. - Query normally -- Citus parallelizes execution across workers automatically.
Example
-- Create and distribute a table
CREATE TABLE events (
tenant_id BIGINT,
id BIGSERIAL,
payload JSONB,
ts TIMESTAMPTZ DEFAULT now()
);
SELECT create_distributed_table('events', 'tenant_id');
-- Columnar storage for analytics
CREATE TABLE logs (ts TIMESTAMPTZ, level TEXT, message TEXT)
USING columnar;
-- Query across shards
SELECT tenant_id, COUNT(*), AVG(payload->>'duration')
FROM events
WHERE ts > now() - INTERVAL '1 day'
GROUP BY tenant_id;
Related on TokRepo
- AI tools for database -- explore database tools and extensions curated on TokRepo.
- Featured workflows -- discover popular developer tools and AI workflows.
Common pitfalls
- Choosing the wrong distribution column leads to data skew and hot shards. Pick a column with high cardinality and even distribution (tenant_id is the classic choice).
- Cross-shard joins are expensive. Design your schema so that co-located tables share the same distribution column to keep joins local.
- Not all Postgres features work identically in distributed mode. Check Citus documentation for limitations around CTEs, window functions, and certain DDL operations.
Preguntas frecuentes
Citus is used for scaling PostgreSQL horizontally. Primary use cases are multi-tenant SaaS applications (isolate tenant data across shards) and real-time analytics (parallel aggregation across distributed data). It handles both OLTP and OLAP workloads.
Yes. Citus is open-source under the AGPL license. It is also available as a managed service through Azure Cosmos DB for PostgreSQL. The open-source version includes all sharding, columnar storage, and parallel query features.
PostgreSQL partitioning splits data within a single node. Citus distributes data across multiple nodes with a coordinator that routes queries. Partitioning helps with pruning; Citus helps with horizontal scaling beyond what one machine can handle.
Yes. Citus is a Postgres extension, so it works alongside other extensions including PostGIS, pg_trgm, and hstore. Geospatial queries on distributed tables work as expected when the distribution column is included in the query.
Columnar storage stores data by column rather than by row, which improves compression ratios and scan performance for analytical queries. In Citus, create a table with `USING columnar` to enable it. Best for append-only data like logs and time-series.
Referencias (3)
- Citus GitHub— Citus is a PostgreSQL extension for distributed database capabilities
- Citus Documentation— Citus supports columnar storage and horizontal sharding
- Microsoft Azure— Citus is part of Azure Cosmos DB for PostgreSQL
Relacionados en TokRepo
Discusión
Activos relacionados
Apache AGE — Graph Database Extension for PostgreSQL
Apache AGE (A Graph Extension) adds graph database capabilities to PostgreSQL. Query your existing Postgres data as a graph using openCypher while keeping full SQL compatibility.
Patroni — High Availability PostgreSQL Cluster Manager
Patroni is a Python-based template for creating and managing highly available PostgreSQL clusters with automatic failover, using distributed consensus stores like etcd, Consul, or ZooKeeper.
PgCat — PostgreSQL Pooler with Sharding and Load Balancing
A high-performance PostgreSQL connection pooler written in Rust that supports sharding, read/write splitting, load balancing, and automatic failover.
PostgreSQL — The Most Advanced Open Source Relational Database
PostgreSQL is the most powerful open-source relational database system. It combines SQL compliance, extensibility, and reliability with advanced features like JSONB, full-text search, vector embeddings (pgvector), and PostGIS — making it the database of choice for modern applications.