How do I install Chaos Mesh — Cloud-Native Chaos Engineering for Kubernetes?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Chaos Mesh — Cloud-Native Chaos Engineering for Kubernetes

Introduction

Chaos Mesh is a CNCF incubating project that lets platform teams run controlled, reproducible failure experiments against a live Kubernetes cluster. By expressing faults as CRDs, teams can version, schedule, and gate experiments in CI the same way they manage any other Kubernetes resource — essential for proving resilience claims.

What Chaos Mesh Does

Injects pod failures (kill, container-kill, pod-failure)
Simulates network chaos: latency, packet loss, bandwidth throttle, partition
Faults disk IO: read/write latency, errors, fill-up
Clock skew, DNS chaos, HTTP chaos, kernel chaos via BPF
Orchestrates multi-step Workflows for complex game-day scenarios

Architecture Overview

Chaos Mesh ships a controller manager, a per-node chaos-daemon DaemonSet (which uses nsenter/iptables/tc/BPF for kernel-level injection), and a React dashboard. CRDs declare the experiment; the controller resolves target pods, instructs the daemons, and records status transitions. Experiments are cleaned up automatically at duration end or when the CR is deleted.

Self-Hosting & Configuration

Helm chart or Operator — chaos-mesh and chaos-daemon run cluster-wide
RBAC: restrict namespaces via chaosmesh.org/inject: enabled labels
Dashboard with Google/GitHub/OIDC SSO; chaosctl CLI for scripting
Integrations: Argo Workflows, GitHub Actions, Litmus via CRD
Observability: Prometheus metrics, experiment events shipped to OpenTelemetry

Key Features

Pure CRD interface — GitOps and code review friendly
Rich fault taxonomy (pod, net, IO, DNS, HTTP, kernel, time)
Schedule + Workflow resources for recurring and multi-step drills
Safety switches: dry-run, blast-radius labels, auto-recovery on CR deletion
CNCF incubating project with active PingCAP + community maintainers

Comparison with Similar Tools

LitmusChaos — similar CNCF project; experiment-hub workflow, different CR model
Gremlin — commercial SaaS; richer UI, paid per-target
Chaos Monkey (Netflix) — original, EC2-only, limited to instance termination
AWS Fault Injection Simulator — AWS-native; tightly coupled to AWS APIs
Powerful Seal — older, less active; mostly pod-kill scope

FAQ

Q: Is it safe for production? A: With proper namespace selectors, blast-radius labels, and approvals, many teams run Chaos Mesh in prod game-days. Start in staging.

Q: Does it need privileged pods? A: Yes. chaos-daemon needs host network + capabilities for iptables/tc injection.

Q: Can I run experiments in CI? A: Yes. chaosctl or raw kubectl in GitHub Actions; assert recovery via Prometheus queries.

Q: How do I stop a rogue experiment? A: kubectl delete networkchaos web-latency triggers automatic cleanup within seconds.

Chaos Mesh — Cloud-Native Chaos Engineering for Kubernetes

Introduction

What Chaos Mesh Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

DevSpace — Developer-First Kubernetes Workflow Tool

Headlamp — Extensible Open-Source Kubernetes Web UI

Telepresence — Local Dev for Remote Kubernetes