LitmusChaos — Cloud-Native Chaos Engineering for Kubernetes
Inject controlled failures into your Kubernetes workloads to test resilience. A CNCF incubating project with a library of 50+ chaos experiments.
What it is
LitmusChaos is a CNCF incubating project that provides a framework for practicing chaos engineering on Kubernetes. It ships with a library of 50+ pre-built chaos experiments -- pod kill, network latency, CPU hog, disk fill, and more -- that you inject into running workloads to verify they handle failures gracefully.
The tool targets SREs, platform engineers, and DevOps teams who manage Kubernetes clusters in production. If you need to prove your services survive node failures, network partitions, or resource exhaustion, LitmusChaos gives you repeatable experiments with observable outcomes.
How it saves time or tokens
Manually testing failure scenarios requires writing custom scripts, coordinating team members, and hoping you covered the right cases. LitmusChaos replaces that ad-hoc process with a catalog of pre-built experiments and a ChaosCenter dashboard. You define a chaos scenario once, schedule it, and get automated reports. This turns what was a multi-day manual exercise into a repeatable pipeline step.
How to use
- Install LitmusChaos via Helm into your Kubernetes cluster:
helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm
helm install litmus litmuschaos/litmus \
--namespace litmus --create-namespace
- Access the ChaosCenter dashboard to browse available experiments and create chaos scenarios.
- Define a ChaosEngine resource targeting your application namespace and select experiments (pod-delete, network-loss, etc.).
- Run the experiment and observe results in the ChaosCenter UI or via Kubernetes events.
Example
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: nginx-chaos
namespace: default
spec:
appinfo:
appns: default
applabel: 'app=nginx'
appkind: deployment
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
components:
env:
- name: TOTAL_CHAOS_DURATION
value: '30'
- name: CHAOS_INTERVAL
value: '10'
This deletes nginx pods every 10 seconds for 30 seconds, letting you observe how your deployment recovers.
Related on TokRepo
- DevOps Tools -- Browse automation and infrastructure tools for CI/CD pipelines
- Self-Hosted Solutions -- Explore self-hosted platforms for infrastructure management
Common pitfalls
- Running chaos experiments on production without proper blast radius limits can cause real outages. Always set namespace selectors and duration caps.
- Skipping the steady-state hypothesis means you cannot measure whether the experiment actually proved resilience.
- Forgetting RBAC permissions for the chaos service account leads to silent experiment failures with no useful feedback.
Frequently Asked Questions
LitmusChaos supports Kubernetes 1.17 and above. It works with managed clusters on EKS, GKE, AKS, and self-managed clusters. Check the official docs for the latest compatibility matrix.
Yes. LitmusChaos experiments can be triggered via kubectl or the LitmusChaos API, making them easy to integrate into CI/CD pipelines as a post-deploy verification step. GitHub Actions and GitLab CI examples are available in the docs.
LitmusChaos is designed for production use with proper safeguards. Use namespace selectors, duration limits, and abort conditions to control blast radius. Start with non-critical workloads and expand coverage gradually.
Chaos Monkey from Netflix randomly terminates instances. LitmusChaos offers a broader experiment library (network, disk, CPU, DNS) with declarative Kubernetes-native definitions and a visual dashboard for orchestration.
ChaosCenter provides a web dashboard with experiment history, pass/fail verdicts, and resilience scores. Experiments also emit Kubernetes events and support Prometheus metrics export for integration with Grafana dashboards.
Citations (3)
- LitmusChaos GitHub— CNCF incubating project with 50+ chaos experiments
- LitmusChaos Documentation— ChaosCenter dashboard for experiment orchestration
- CNCF Landscape— Kubernetes-native chaos engineering practices
Related on TokRepo
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.