Introduction
Patroni automates PostgreSQL high availability by managing streaming replication, leader election, and automatic failover. It uses a distributed consensus store (etcd, Consul, or ZooKeeper) to coordinate cluster state, ensuring one primary and zero or more synchronous or asynchronous replicas.
What Patroni Does
- Bootstraps new PostgreSQL clusters or takes over existing instances
- Performs automatic leader election and failover via distributed consensus
- Manages streaming replication configuration between primary and replicas
- Provides a REST API and patronictl CLI for cluster operations and switchover
- Supports scheduled and on-demand switchovers with no data loss
Architecture Overview
Each PostgreSQL node runs a Patroni agent that registers itself with a DCS (distributed configuration store). The DCS holds the leader lock; the agent holding the lock configures its PostgreSQL instance as primary, while others configure as replicas. If the leader fails to renew its lock, a replica with the most recent WAL position is promoted. HAProxy or PgBouncer sits in front to route connections to the current primary.
Self-Hosting & Configuration
- Requires a running DCS (etcd, Consul, or ZooKeeper) for consensus
- Configure patroni.yml with PostgreSQL data directory, replication settings, and DCS endpoints
- Deploy one Patroni agent per PostgreSQL node, managed by systemd
- Place HAProxy or a connection pooler in front for transparent client routing
- Tune TTL, loop_wait, and retry_timeout to balance failover speed and stability
Key Features
- Automatic failover with configurable data loss tolerance (synchronous mode available)
- REST API for health checks, switchover, and reinitializing failed nodes
- Supports custom bootstrap methods including pg_basebackup, WAL-E, and pgBackRest
- Watchdog integration for split-brain prevention
- Used in production by Zalando and many other organizations
Comparison with Similar Tools
- Stolon — Go-based PostgreSQL HA manager; similar architecture but less actively maintained
- repmgr — replication manager with manual or automatic failover; Patroni offers tighter DCS integration
- pg_auto_failover (Citus) — built-in monitor node for HA; simpler setup but less flexible than Patroni
- CloudNativePG — Kubernetes-native PostgreSQL operator; Patroni is infrastructure-agnostic
FAQ
Q: What happens if the DCS goes down? A: Patroni enters a safe mode where the current primary continues serving but no failover can occur until the DCS recovers.
Q: Can I use Patroni with existing PostgreSQL instances? A: Yes. Patroni can adopt running PostgreSQL instances without reinitializing them.
Q: How fast is automatic failover? A: Typically 10-30 seconds, depending on TTL and loop_wait configuration.
Q: Does Patroni handle connection routing? A: Patroni exposes health endpoints. Pair it with HAProxy, PgBouncer, or a service mesh for automatic connection routing.