Introduction
The Zalando Postgres Operator manages PostgreSQL clusters on Kubernetes as a custom resource. Developed and battle-tested at Zalando for running hundreds of PostgreSQL instances in production, it handles cluster creation, high availability via Patroni, connection pooling, rolling upgrades, and automated WAL-based backups.
What Zalando Postgres Operator Does
- Creates and manages multi-replica PostgreSQL clusters as Kubernetes custom resources
- Provides automatic failover and high availability using Patroni consensus
- Configures PgBouncer connection pooling sidecars automatically
- Handles rolling minor and major version upgrades with minimal downtime
- Manages continuous WAL archiving and base backups to S3-compatible storage
Architecture Overview
The operator watches for postgresql custom resources and reconciles the desired state. Each cluster runs as a StatefulSet with Patroni managing leader election via Kubernetes endpoints or etcd. A sidecar PgBouncer handles connection pooling. The operator creates services for the primary and replica endpoints, manages secrets for database credentials, and configures logical backup CronJobs. WAL-E or WAL-G handles continuous backup to object storage.
Self-Hosting & Configuration
- Deploy the operator via kubectl, Helm chart, or Kustomize manifests
- Configure S3/GCS/Azure Blob credentials for WAL archiving and base backups
- Set global defaults in the OperatorConfiguration CRD (instance sizes, storage classes, backup schedules)
- Define per-cluster settings (replicas, resources, PostgreSQL parameters) in the postgresql CR
- Enable the operator UI for a web-based view of all managed clusters
Key Features
- Patroni-based HA with automatic leader election and replica promotion
- Built-in PgBouncer connection pooling with transparent reconfiguration on failover
- Logical and physical backup strategies with point-in-time recovery
- Clone-from-backup: spin up new clusters from existing WAL archives
- Team-based access control: maps Kubernetes teams to database roles automatically
Comparison with Similar Tools
- CloudNativePG — newer operator with declarative backup management; Zalando operator is more mature with Patroni-based HA
- CrunchyData PGO — comprehensive operator with pgBackRest; Zalando operator uses WAL-G and has simpler CRD semantics
- KubeDB — multi-database operator; Zalando operator is PostgreSQL-specialized
- Bitnami PostgreSQL Helm chart — simple StatefulSet deployment without operator-level lifecycle management
FAQ
Q: How does failover work? A: Patroni runs inside each PostgreSQL pod and uses Kubernetes endpoints for leader election. When the primary fails, Patroni promotes the most up-to-date replica within seconds.
Q: Can I use it with existing PostgreSQL databases? A: You can clone from an existing WAL archive or logical backup, but direct import of a running external database requires manual migration steps.
Q: What PostgreSQL versions are supported? A: The operator supports PostgreSQL 12 through 17. The version is specified in the cluster CR and can be upgraded via rolling update.
Q: Does it support connection pooling? A: Yes. The operator can deploy PgBouncer as a sidecar or separate deployment, configured automatically based on the cluster spec.