# SeaweedFS — Distributed Object, File, and S3 Storage

> Fast distributed storage system for blobs, files, S3 API, and even Iceberg tables, designed to handle billions of files with O(1) disk access.

## Install

Save in your project root:

# SeaweedFS — Distributed Object, File, and S3 Storage

## Quick Use
```bash
# Single-node server (master + volume + filer + S3 + WebDAV)
wget https://github.com/seaweedfs/seaweedfs/releases/latest/download/linux_amd64_full.tar.gz
tar xzf linux_amd64_full.tar.gz && ./weed server -s3 -dir=/data

# S3-compatible endpoint at http://localhost:8333
aws --endpoint-url http://localhost:8333 s3 mb s3://bucket
aws --endpoint-url http://localhost:8333 s3 cp file.txt s3://bucket/
```

## Introduction
SeaweedFS started as a Facebook-Haystack-inspired blob store and grew into a full storage stack: object store, POSIX filer, HDFS-compatible FS, and S3/WebDAV gateways — all in one Go binary. It targets the "billions of small files" workload where traditional distributed filesystems choke on metadata.

## What SeaweedFS Does
- Stores blobs in large volume files, each indexed by a tiny in-memory lookup.
- Offers an S3-compatible API with bucket policies, presigned URLs, and lifecycle rules.
- Mounts as a POSIX filesystem via FUSE or WebDAV for app compatibility.
- Erasure-codes cold volumes (Reed-Solomon) to cut storage cost.
- Replicates across racks, datacenters, or clouds via async or sync modes.

## Architecture Overview
Three layers: masters run Raft and coordinate volume placement; volume servers hold the actual data in append-only log files; filers add a metadata DB (LevelDB, Redis, SQL, Cassandra, or FoundationDB) to provide a tree namespace. Gateways (S3, FUSE, WebDAV, iSCSI, NFS) translate their protocol into filer + volume calls.

## Self-Hosting & Configuration
- Small setup: one process with `weed server`; great up to tens of TB.
- Production: 3 masters + N volume servers + N filers behind a load balancer.
- Metadata store: Redis for speed, PostgreSQL for SQL queries, Cassandra for scale.
- Encrypt at rest with `volume.encrypt=true` and rotate via per-volume keys.
- Tier cold data to S3, GCS, or Azure Blob with the remote tiering daemon.

## Key Features
- O(1) disk read for every file — seek count does not grow with file count.
- Cloud tiering keeps hot data local and archives cold volumes transparently.
- Iceberg and Parquet integration turn SeaweedFS into a data-lake backend.
- Cross-cluster active-active replication keeps DCs in sync without a SAN.
- Runs on everything from a Raspberry Pi to a 1000-node cluster.

## Comparison with Similar Tools
- **MinIO** — simpler S3-only server, no POSIX filer, AGPL license.
- **Ceph** — block + object + file, powerful but operationally heavy.
- **GlusterFS** — POSIX-first, maintenance slowing, no S3 by default.
- **JuiceFS** — POSIX over object storage with a separate metadata DB.
- **OpenIO** — S3-focused, now part of OVH, fewer community tools.

## Key Features
- One binary covers master, volume, filer, S3, mount — easy to script.
- Built-in web UI for volume health, replication, and per-bucket stats.
- gRPC and REST APIs for everything the CLI does.

## FAQ
**Q:** Can I replace MinIO with it?
A: Yes for S3 workloads; SeaweedFS also handles POSIX and WebDAV which MinIO does not.

**Q:** Consistency model?
A: Strong for metadata via Raft on masters; eventual for async cross-DC replication.

**Q:** License?
A: Apache 2.0 — commercial use is fine.

**Q:** Kubernetes?
A: Official Helm chart plus CSI driver for PVCs backed by the filer.

## Sources
- https://github.com/seaweedfs/seaweedfs
- https://github.com/seaweedfs/seaweedfs/wiki

---
Source: https://tokrepo.com/en/workflows/ea4a0584-3918-11f1-9bc6-00163e2b0d79
Author: AI Open Source