etcd — Distributed Reliable Key-Value Store for Critical Data
etcd is a strongly consistent, distributed key-value store for configuration, service discovery, and coordination. Uses the Raft consensus algorithm. Powers Kubernetes, OpenShift, CoreOS, and many other distributed systems.
What it is
etcd is an open-source, strongly consistent, distributed key-value store written in Go. It uses the Raft consensus algorithm to ensure data reliability across a cluster of machines. etcd is the backbone of Kubernetes, storing all cluster state, configuration, and metadata.
Beyond Kubernetes, etcd serves as a building block for service discovery, distributed locking, leader election, and configuration management in any distributed system that needs a reliable source of truth.
How it saves time or tokens
Building a distributed consensus system from scratch is a multi-year effort. etcd provides a battle-tested implementation of Raft with a simple key-value API. Teams adopt etcd instead of implementing their own coordination layer, saving months of engineering. Its watch API enables reactive architectures where services respond to configuration changes instantly rather than polling.
How to use
- Download the etcd binary from the official releases page or install via your package manager.
- Start a single-node cluster for development with
etcd(no flags needed) or configure a multi-node cluster with peer URLs. - Use
etcdctlto read and write keys:etcdctl put mykey myvalueandetcdctl get mykey.
Example
# Start etcd locally
etcd --listen-client-urls http://localhost:2379 \
--advertise-client-urls http://localhost:2379
# Put and get a key
etcdctl put /config/db_host '192.168.1.100'
etcdctl get /config/db_host
# Output: /config/db_host
# 192.168.1.100
# Watch for changes
etcdctl watch /config/ --prefix
Related on TokRepo
- DevOps tools — Infrastructure and deployment automation
- Self-hosted solutions — Run critical services on your own infrastructure
Common pitfalls
- etcd is not a general-purpose database. It is designed for small amounts of critical metadata (default max value size is 1.5MB). Do not store large blobs or high-volume transactional data.
- Cluster sizing matters. A 3-node cluster tolerates 1 failure, a 5-node cluster tolerates 2. Always run an odd number of nodes.
- Disk I/O latency directly affects etcd performance. Use SSDs for the data directory. Slow disks cause leader election timeouts and cluster instability.
Frequently Asked Questions
Kubernetes needs a reliable, consistent store for all cluster state: pod definitions, service endpoints, secrets, and config maps. etcd provides strong consistency guarantees via Raft consensus, which ensures that all Kubernetes API servers see the same data even during network partitions or node failures.
Run 3 nodes for most production deployments (tolerates 1 failure). Run 5 nodes if you need higher fault tolerance (tolerates 2 failures). More nodes increase write latency because Raft requires a majority quorum. Never run an even number of nodes.
etcd and Redis serve different purposes. Redis is an in-memory data structure store optimized for speed and variety of data types. etcd is optimized for consistency and reliability of small configuration data. Consul overlaps more with etcd for service discovery but adds health checking and a service mesh.
If a minority of nodes fail, the cluster continues operating normally. Reads and writes proceed through the remaining majority. When the failed node comes back, it automatically syncs with the cluster. If a majority fails, the cluster becomes read-only until quorum is restored.
Use etcdctl snapshot save to create a point-in-time snapshot of the entire datastore. Store these snapshots off-cluster. To restore, use etcdctl snapshot restore. For Kubernetes clusters, this is the primary disaster recovery mechanism for cluster state.
Citations (3)
- etcd GitHub Repository— etcd uses Raft consensus algorithm for distributed consistency
- Kubernetes Components Docs— Kubernetes stores all cluster state in etcd
- Raft Paper (Stanford)— Raft consensus algorithm specification
Related on TokRepo
Discussion
Related Assets
Conda — Cross-Platform Package and Environment Manager
Install, update, and manage packages and isolated environments for Python, R, C/C++, and hundreds of other languages from a single tool.
Sphinx — Python Documentation Generator
Generate professional documentation from reStructuredText and Markdown with cross-references, API autodoc, and multiple output formats.
Neutralinojs — Lightweight Cross-Platform Desktop Apps
Build desktop applications with HTML, CSS, and JavaScript using a tiny native runtime instead of bundling Chromium.