# HashiCorp Serf — Decentralized Cluster Membership and Orchestration

> HashiCorp Serf is a lightweight agent for decentralized cluster membership, failure detection, and event-driven orchestration using a gossip protocol.

## Install

Save in your project root:

# HashiCorp Serf — Decentralized Cluster Membership and Orchestration

## Quick Use
```bash
# Install on Linux
wget https://releases.hashicorp.com/serf/0.10.1/serf_0.10.1_linux_amd64.zip
unzip serf_0.10.1_linux_amd64.zip && sudo mv serf /usr/local/bin/

# Start an agent and join a cluster
serf agent -node=web1 &
serf agent -node=web2 -join=127.0.0.1:7946 &

# List cluster members
serf members
```

## Introduction
HashiCorp Serf is a decentralized tool for cluster membership, failure detection, and orchestration built on a gossip protocol. Unlike centralized service registries, Serf operates without a leader or single point of failure. Each node runs a lightweight agent that communicates via the SWIM-based memberlist library, making it suitable for environments where eventual consistency and partition tolerance are preferred over strict coordination.

## What HashiCorp Serf Does
- Maintains a decentralized, eventually consistent view of cluster membership across all nodes
- Detects node failures within seconds using a gossip-based protocol with configurable probe intervals
- Propagates custom events and queries across the cluster for orchestration and coordination
- Triggers event handler scripts on membership changes (join, leave, fail, update) for automation
- Provides tagged membership for grouping nodes by role, datacenter, or application

## Architecture Overview
Serf is built on the memberlist library, which implements a variant of the SWIM protocol for gossip-based membership. Each agent periodically probes random peers and disseminates state changes (joins, leaves, failures) through piggybacked gossip messages. Custom events propagate through a separate reliable broadcast mechanism with configurable TTL. There is no central server; every node is a peer with an identical view of the cluster that converges through epidemic-style communication.

## Self-Hosting & Configuration
- Download a single binary for Linux, macOS, or Windows from releases.hashicorp.com
- Start an agent with serf agent and join existing clusters via serf join or -join flag
- Configure bind address, advertise address, encryption key, and log level via config file or flags
- Enable encryption for gossip traffic with a shared 32-byte key using -encrypt or the config file
- Write event handler scripts (shell, Python, etc.) that Serf invokes on cluster membership changes

## Key Features
- Fully decentralized with no leader election and no single point of failure
- Sub-second failure detection with configurable probe intervals and suspicion timeouts
- Custom events and queries for ad-hoc cluster-wide orchestration without external coordination
- Node tags for metadata-driven routing and filtering of event handlers
- Lightweight single binary with minimal resource usage suitable for embedded and edge deployments

## Comparison with Similar Tools
- **HashiCorp Consul** — full service mesh and KV store that uses Serf internally; Serf is lower-level and does not provide service discovery or health checking APIs
- **etcd** — strongly consistent KV store using Raft; Serf is AP (eventual consistency) with no data storage
- **ZooKeeper** — centralized coordination service; Serf is decentralized with no leader
- **memberlist** — the Go library Serf is built on; Serf adds CLI, event handlers, and operational tooling on top
- **Gossip protocols (Akka Cluster)** — similar approach within the JVM; Serf is a standalone system-level tool

## FAQ
**Q: How does Serf differ from Consul?**
A: Consul is a higher-level system built on Serf that adds service discovery, health checks, KV storage, and service mesh. Serf provides only cluster membership, failure detection, and event propagation.

**Q: Can Serf handle network partitions?**
A: Serf is designed for partition tolerance. During a partition, each side maintains its own membership view. When connectivity is restored, membership state converges through gossip reconciliation.

**Q: How many nodes can a Serf cluster support?**
A: Serf scales to thousands of nodes. Gossip overhead grows logarithmically, and probe intervals can be tuned for larger clusters.

**Q: What happens to event handlers when a node fails?**
A: Surviving nodes detect the failure and invoke their configured event handler scripts with the failed member details, enabling automated responses like DNS updates or load balancer reconfiguration.

## Sources
- https://github.com/hashicorp/serf
- https://www.serf.io/docs/

---
Source: https://tokrepo.com/en/workflows/asset-3af5a3cb
Author: AI Open Source