ScriptsMay 10, 2026·3 min read

Zalando Postgres Operator — Production PostgreSQL Clusters on Kubernetes

A Kubernetes operator that manages the full lifecycle of PostgreSQL clusters with patroni-based HA, connection pooling, and automated backups.

Introduction

The Zalando Postgres Operator manages PostgreSQL clusters on Kubernetes as a custom resource. Developed and battle-tested at Zalando for running hundreds of PostgreSQL instances in production, it handles cluster creation, high availability via Patroni, connection pooling, rolling upgrades, and automated WAL-based backups.

What Zalando Postgres Operator Does

  • Creates and manages multi-replica PostgreSQL clusters as Kubernetes custom resources
  • Provides automatic failover and high availability using Patroni consensus
  • Configures PgBouncer connection pooling sidecars automatically
  • Handles rolling minor and major version upgrades with minimal downtime
  • Manages continuous WAL archiving and base backups to S3-compatible storage

Architecture Overview

The operator watches for postgresql custom resources and reconciles the desired state. Each cluster runs as a StatefulSet with Patroni managing leader election via Kubernetes endpoints or etcd. A sidecar PgBouncer handles connection pooling. The operator creates services for the primary and replica endpoints, manages secrets for database credentials, and configures logical backup CronJobs. WAL-E or WAL-G handles continuous backup to object storage.

Self-Hosting & Configuration

  • Deploy the operator via kubectl, Helm chart, or Kustomize manifests
  • Configure S3/GCS/Azure Blob credentials for WAL archiving and base backups
  • Set global defaults in the OperatorConfiguration CRD (instance sizes, storage classes, backup schedules)
  • Define per-cluster settings (replicas, resources, PostgreSQL parameters) in the postgresql CR
  • Enable the operator UI for a web-based view of all managed clusters

Key Features

  • Patroni-based HA with automatic leader election and replica promotion
  • Built-in PgBouncer connection pooling with transparent reconfiguration on failover
  • Logical and physical backup strategies with point-in-time recovery
  • Clone-from-backup: spin up new clusters from existing WAL archives
  • Team-based access control: maps Kubernetes teams to database roles automatically

Comparison with Similar Tools

  • CloudNativePG — newer operator with declarative backup management; Zalando operator is more mature with Patroni-based HA
  • CrunchyData PGO — comprehensive operator with pgBackRest; Zalando operator uses WAL-G and has simpler CRD semantics
  • KubeDB — multi-database operator; Zalando operator is PostgreSQL-specialized
  • Bitnami PostgreSQL Helm chart — simple StatefulSet deployment without operator-level lifecycle management

FAQ

Q: How does failover work? A: Patroni runs inside each PostgreSQL pod and uses Kubernetes endpoints for leader election. When the primary fails, Patroni promotes the most up-to-date replica within seconds.

Q: Can I use it with existing PostgreSQL databases? A: You can clone from an existing WAL archive or logical backup, but direct import of a running external database requires manual migration steps.

Q: What PostgreSQL versions are supported? A: The operator supports PostgreSQL 12 through 17. The version is specified in the cluster CR and can be upgraded via rolling update.

Q: Does it support connection pooling? A: Yes. The operator can deploy PgBouncer as a sidecar or separate deployment, configured automatically based on the cluster spec.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets