ConfigsApr 11, 2026·2 min read

etcd — Distributed Reliable Key-Value Store for Critical Data

etcd is a strongly consistent, distributed key-value store for configuration, service discovery, and coordination. Uses the Raft consensus algorithm. Powers Kubernetes, OpenShift, CoreOS, and many other distributed systems.

TL;DR
etcd is a strongly consistent, distributed key-value store that powers Kubernetes configuration, service discovery, and coordination.
§01

What it is

etcd is an open-source, strongly consistent, distributed key-value store written in Go. It uses the Raft consensus algorithm to ensure data reliability across a cluster of machines. etcd is the backbone of Kubernetes, storing all cluster state, configuration, and metadata.

Beyond Kubernetes, etcd serves as a building block for service discovery, distributed locking, leader election, and configuration management in any distributed system that needs a reliable source of truth.

§02

How it saves time or tokens

Building a distributed consensus system from scratch is a multi-year effort. etcd provides a battle-tested implementation of Raft with a simple key-value API. Teams adopt etcd instead of implementing their own coordination layer, saving months of engineering. Its watch API enables reactive architectures where services respond to configuration changes instantly rather than polling.

§03

How to use

  1. Download the etcd binary from the official releases page or install via your package manager.
  2. Start a single-node cluster for development with etcd (no flags needed) or configure a multi-node cluster with peer URLs.
  3. Use etcdctl to read and write keys: etcdctl put mykey myvalue and etcdctl get mykey.
§04

Example

# Start etcd locally
etcd --listen-client-urls http://localhost:2379 \
     --advertise-client-urls http://localhost:2379

# Put and get a key
etcdctl put /config/db_host '192.168.1.100'
etcdctl get /config/db_host
# Output: /config/db_host
# 192.168.1.100

# Watch for changes
etcdctl watch /config/ --prefix
§05

Related on TokRepo

§06

Common pitfalls

  • etcd is not a general-purpose database. It is designed for small amounts of critical metadata (default max value size is 1.5MB). Do not store large blobs or high-volume transactional data.
  • Cluster sizing matters. A 3-node cluster tolerates 1 failure, a 5-node cluster tolerates 2. Always run an odd number of nodes.
  • Disk I/O latency directly affects etcd performance. Use SSDs for the data directory. Slow disks cause leader election timeouts and cluster instability.

Frequently Asked Questions

Why does Kubernetes use etcd?+

Kubernetes needs a reliable, consistent store for all cluster state: pod definitions, service endpoints, secrets, and config maps. etcd provides strong consistency guarantees via Raft consensus, which ensures that all Kubernetes API servers see the same data even during network partitions or node failures.

How many etcd nodes should I run?+

Run 3 nodes for most production deployments (tolerates 1 failure). Run 5 nodes if you need higher fault tolerance (tolerates 2 failures). More nodes increase write latency because Raft requires a majority quorum. Never run an even number of nodes.

Can etcd replace Redis or Consul?+

etcd and Redis serve different purposes. Redis is an in-memory data structure store optimized for speed and variety of data types. etcd is optimized for consistency and reliability of small configuration data. Consul overlaps more with etcd for service discovery but adds health checking and a service mesh.

What happens when an etcd node goes down?+

If a minority of nodes fail, the cluster continues operating normally. Reads and writes proceed through the remaining majority. When the failed node comes back, it automatically syncs with the cluster. If a majority fails, the cluster becomes read-only until quorum is restored.

How do I back up etcd?+

Use etcdctl snapshot save to create a point-in-time snapshot of the entire datastore. Store these snapshots off-cluster. To restore, use etcdctl snapshot restore. For Kubernetes clusters, this is the primary disaster recovery mechanism for cluster state.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets