Introduction
LitmusChaos is a CNCF incubating project that brings chaos engineering to Kubernetes. It provides a framework for running controlled failure experiments, pod kills, network delays, CPU stress, and more, so teams can verify that their applications recover gracefully under adverse conditions.
What LitmusChaos Does
- Runs chaos experiments as Kubernetes CRDs with a declarative YAML workflow
- Offers a ChaosHub with 50+ prebuilt experiments for pods, nodes, and infrastructure
- Provides a web-based ChaosCenter for designing, scheduling, and observing experiments
- Supports steady-state hypothesis checks to validate resilience automatically
- Integrates with CI/CD pipelines to run chaos tests as part of deployment validation
Architecture Overview
LitmusChaos consists of a control plane (ChaosCenter) and an execution plane. ChaosCenter is a web application backed by MongoDB that manages experiment definitions and schedules. The execution plane runs in each target cluster as a set of operators: the Chaos Operator watches ChaosEngine CRDs and launches experiment pods that inject the specified failure. Results are reported back to ChaosCenter for analysis and visualization.
Self-Hosting & Configuration
- Deploy ChaosCenter via Helm chart or kubectl manifests into a management cluster
- Register target clusters as Chaos Delegates through the ChaosCenter UI
- Browse the ChaosHub to select and customize experiments
- Define ChaosWorkflows combining multiple experiments with steady-state checks
- Schedule recurring chaos tests via cron expressions in the workflow definition
Key Features
- CNCF incubating project with an active community and vendor-neutral governance
- 50+ prebuilt experiments covering pod, node, network, DNS, and cloud provider faults
- GitOps-native experiment management with version-controlled workflow definitions
- Observability integration with Prometheus metrics and Grafana dashboards
- Multi-cluster chaos orchestration from a single ChaosCenter instance
Comparison with Similar Tools
- Chaos Mesh — CNCF project with similar Kubernetes-native chaos; LitmusChaos offers a richer web UI and ChaosHub marketplace
- Gremlin — Commercial SaaS chaos platform; LitmusChaos is fully open-source and self-hosted
- AWS Fault Injection Simulator — AWS-only managed service; LitmusChaos works on any Kubernetes cluster
- Pumba — Docker-level chaos tool; LitmusChaos operates at the Kubernetes abstraction layer with CRD-driven workflows
FAQ
Q: Can LitmusChaos cause production outages? A: Experiments are scoped by namespace, labels, and blast radius controls. Start with non-production clusters and narrow targeting to reduce risk.
Q: Does it require ChaosCenter to run experiments? A: No. You can run experiments directly via ChaosEngine CRDs and kubectl without ChaosCenter, though the UI simplifies workflow management.
Q: How do I create a custom chaos experiment? A: Write a Go or shell-based experiment, package it as a container image, and register it in a custom ChaosHub or inline in your workflow.
Q: What steady-state hypothesis checks are supported? A: Built-in probes support HTTP endpoints, command output, Kubernetes resource conditions, and Prometheus queries.