Scripts2026年5月10日·1 分钟阅读

Zalando Postgres Operator — Production PostgreSQL Clusters on Kubernetes

A Kubernetes operator that manages the full lifecycle of PostgreSQL clusters with patroni-based HA, connection pooling, and automated backups.

Introduction

The Zalando Postgres Operator manages PostgreSQL clusters on Kubernetes as a custom resource. Developed and battle-tested at Zalando for running hundreds of PostgreSQL instances in production, it handles cluster creation, high availability via Patroni, connection pooling, rolling upgrades, and automated WAL-based backups.

What Zalando Postgres Operator Does

  • Creates and manages multi-replica PostgreSQL clusters as Kubernetes custom resources
  • Provides automatic failover and high availability using Patroni consensus
  • Configures PgBouncer connection pooling sidecars automatically
  • Handles rolling minor and major version upgrades with minimal downtime
  • Manages continuous WAL archiving and base backups to S3-compatible storage

Architecture Overview

The operator watches for postgresql custom resources and reconciles the desired state. Each cluster runs as a StatefulSet with Patroni managing leader election via Kubernetes endpoints or etcd. A sidecar PgBouncer handles connection pooling. The operator creates services for the primary and replica endpoints, manages secrets for database credentials, and configures logical backup CronJobs. WAL-E or WAL-G handles continuous backup to object storage.

Self-Hosting & Configuration

  • Deploy the operator via kubectl, Helm chart, or Kustomize manifests
  • Configure S3/GCS/Azure Blob credentials for WAL archiving and base backups
  • Set global defaults in the OperatorConfiguration CRD (instance sizes, storage classes, backup schedules)
  • Define per-cluster settings (replicas, resources, PostgreSQL parameters) in the postgresql CR
  • Enable the operator UI for a web-based view of all managed clusters

Key Features

  • Patroni-based HA with automatic leader election and replica promotion
  • Built-in PgBouncer connection pooling with transparent reconfiguration on failover
  • Logical and physical backup strategies with point-in-time recovery
  • Clone-from-backup: spin up new clusters from existing WAL archives
  • Team-based access control: maps Kubernetes teams to database roles automatically

Comparison with Similar Tools

  • CloudNativePG — newer operator with declarative backup management; Zalando operator is more mature with Patroni-based HA
  • CrunchyData PGO — comprehensive operator with pgBackRest; Zalando operator uses WAL-G and has simpler CRD semantics
  • KubeDB — multi-database operator; Zalando operator is PostgreSQL-specialized
  • Bitnami PostgreSQL Helm chart — simple StatefulSet deployment without operator-level lifecycle management

FAQ

Q: How does failover work? A: Patroni runs inside each PostgreSQL pod and uses Kubernetes endpoints for leader election. When the primary fails, Patroni promotes the most up-to-date replica within seconds.

Q: Can I use it with existing PostgreSQL databases? A: You can clone from an existing WAL archive or logical backup, but direct import of a running external database requires manual migration steps.

Q: What PostgreSQL versions are supported? A: The operator supports PostgreSQL 12 through 17. The version is specified in the cluster CR and can be upgraded via rolling update.

Q: Does it support connection pooling? A: Yes. The operator can deploy PgBouncer as a sidecar or separate deployment, configured automatically based on the cluster spec.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产