ScriptsApr 21, 2026·3 min read

Apache ZooKeeper — Distributed Coordination Service for Reliable Systems

A centralized service for maintaining configuration, naming, synchronization, and group services across distributed applications in the Hadoop and Kafka ecosystems.

Introduction

Apache ZooKeeper provides a reliable coordination kernel for distributed systems. It exposes a simple tree-structured namespace (znodes) with strong ordering and consistency guarantees, enabling applications to implement leader election, distributed locking, configuration management, and service discovery. ZooKeeper underpins critical infrastructure including Apache Kafka, HBase, Hadoop YARN, and Solr.

What ZooKeeper Does

  • Maintains a hierarchical key-value store (znodes) with strong consistency via ZAB consensus
  • Provides ephemeral nodes and watches for real-time service discovery and health detection
  • Enables distributed locks, barriers, and leader election through recipe patterns
  • Delivers linearizable writes and sequentially consistent reads across an ensemble
  • Supports ACL-based access control on every znode for multi-tenant environments

Architecture Overview

A ZooKeeper ensemble consists of an odd number of servers (typically 3 or 5) that replicate state using the ZAB (ZooKeeper Atomic Broadcast) protocol. One server is elected leader and handles all write requests; followers replicate the transaction log and serve reads. Clients connect to any server and receive session guarantees including ordered updates and ephemeral node lifecycle tied to session liveness.

Self-Hosting & Configuration

  • Deploy an odd number of servers (3 or 5) for quorum-based fault tolerance
  • Configure zoo.cfg with dataDir, clientPort, and server.N entries for each node
  • Tune tickTime, initLimit, and syncLimit for network and disk latency
  • Enable TLS with ssl.keyStore and ssl.trustStore properties for encrypted client-server traffic
  • Monitor with the four-letter commands (ruok, stat, mntr) or the AdminServer HTTP endpoint

Key Features

  • Sub-millisecond latency for reads; writes are serialized through the leader for consistency
  • Watches notify clients of data changes without polling
  • Sequential znodes provide globally ordered identifiers for distributed queues
  • Transaction support allows atomic multi-operation updates
  • Battle-tested at scale in production by Kafka, HBase, and hundreds of large deployments

Comparison with Similar Tools

  • etcd — newer, gRPC-based; ZooKeeper has deeper ecosystem integration with Hadoop and Kafka
  • Consul — combines service mesh with KV store; ZooKeeper focuses purely on coordination primitives
  • Chubby (Google) — ZooKeeper is the open-source analog of Google's internal lock service
  • Raft-based systems — ZooKeeper uses ZAB, which predates Raft but provides similar guarantees

FAQ

Q: Why do I need an odd number of servers? A: ZooKeeper requires a majority quorum for writes. An odd ensemble size (3, 5, 7) maximizes fault tolerance per server count.

Q: Is ZooKeeper still relevant with etcd and Raft? A: Yes. Kafka relied on ZooKeeper until KRaft mode, and many Hadoop ecosystem tools still require it. Existing deployments are vast.

Q: How does ZooKeeper handle network partitions? A: Servers that lose quorum stop serving writes. Clients on the minority side receive session expiration events and must reconnect to a majority partition.

Q: Can I use ZooKeeper for service discovery? A: Yes. Services register ephemeral znodes that automatically disappear when the session ends, and watchers notify consumers of topology changes.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets