# HashiCorp Serf — Decentralized Cluster Membership and Orchestration > HashiCorp Serf is a lightweight agent for decentralized cluster membership, failure detection, and event-driven orchestration using a gossip protocol. ## Install Save in your project root: # HashiCorp Serf — Decentralized Cluster Membership and Orchestration ## Quick Use ```bash # Install on Linux wget https://releases.hashicorp.com/serf/0.10.1/serf_0.10.1_linux_amd64.zip unzip serf_0.10.1_linux_amd64.zip && sudo mv serf /usr/local/bin/ # Start an agent and join a cluster serf agent -node=web1 & serf agent -node=web2 -join=127.0.0.1:7946 & # List cluster members serf members ``` ## Introduction HashiCorp Serf is a decentralized tool for cluster membership, failure detection, and orchestration built on a gossip protocol. Unlike centralized service registries, Serf operates without a leader or single point of failure. Each node runs a lightweight agent that communicates via the SWIM-based memberlist library, making it suitable for environments where eventual consistency and partition tolerance are preferred over strict coordination. ## What HashiCorp Serf Does - Maintains a decentralized, eventually consistent view of cluster membership across all nodes - Detects node failures within seconds using a gossip-based protocol with configurable probe intervals - Propagates custom events and queries across the cluster for orchestration and coordination - Triggers event handler scripts on membership changes (join, leave, fail, update) for automation - Provides tagged membership for grouping nodes by role, datacenter, or application ## Architecture Overview Serf is built on the memberlist library, which implements a variant of the SWIM protocol for gossip-based membership. Each agent periodically probes random peers and disseminates state changes (joins, leaves, failures) through piggybacked gossip messages. Custom events propagate through a separate reliable broadcast mechanism with configurable TTL. There is no central server; every node is a peer with an identical view of the cluster that converges through epidemic-style communication. ## Self-Hosting & Configuration - Download a single binary for Linux, macOS, or Windows from releases.hashicorp.com - Start an agent with serf agent and join existing clusters via serf join or -join flag - Configure bind address, advertise address, encryption key, and log level via config file or flags - Enable encryption for gossip traffic with a shared 32-byte key using -encrypt or the config file - Write event handler scripts (shell, Python, etc.) that Serf invokes on cluster membership changes ## Key Features - Fully decentralized with no leader election and no single point of failure - Sub-second failure detection with configurable probe intervals and suspicion timeouts - Custom events and queries for ad-hoc cluster-wide orchestration without external coordination - Node tags for metadata-driven routing and filtering of event handlers - Lightweight single binary with minimal resource usage suitable for embedded and edge deployments ## Comparison with Similar Tools - **HashiCorp Consul** — full service mesh and KV store that uses Serf internally; Serf is lower-level and does not provide service discovery or health checking APIs - **etcd** — strongly consistent KV store using Raft; Serf is AP (eventual consistency) with no data storage - **ZooKeeper** — centralized coordination service; Serf is decentralized with no leader - **memberlist** — the Go library Serf is built on; Serf adds CLI, event handlers, and operational tooling on top - **Gossip protocols (Akka Cluster)** — similar approach within the JVM; Serf is a standalone system-level tool ## FAQ **Q: How does Serf differ from Consul?** A: Consul is a higher-level system built on Serf that adds service discovery, health checks, KV storage, and service mesh. Serf provides only cluster membership, failure detection, and event propagation. **Q: Can Serf handle network partitions?** A: Serf is designed for partition tolerance. During a partition, each side maintains its own membership view. When connectivity is restored, membership state converges through gossip reconciliation. **Q: How many nodes can a Serf cluster support?** A: Serf scales to thousands of nodes. Gossip overhead grows logarithmically, and probe intervals can be tuned for larger clusters. **Q: What happens to event handlers when a node fails?** A: Surviving nodes detect the failure and invoke their configured event handler scripts with the failed member details, enabling automated responses like DNS updates or load balancer reconfiguration. ## Sources - https://github.com/hashicorp/serf - https://www.serf.io/docs/ --- Source: https://tokrepo.com/en/workflows/asset-3af5a3cb Author: AI Open Source