ScriptsApr 16, 2026·3 min read

LitmusChaos — Cloud-Native Chaos Engineering for Kubernetes

Inject controlled failures into your Kubernetes workloads to test resilience. A CNCF incubating project with a library of 50+ chaos experiments.

TL;DR
LitmusChaos runs controlled failure experiments on Kubernetes to validate system resilience before incidents happen.
§01

What it is

LitmusChaos is a CNCF incubating project that provides a framework for practicing chaos engineering on Kubernetes. It ships with a library of 50+ pre-built chaos experiments -- pod kill, network latency, CPU hog, disk fill, and more -- that you inject into running workloads to verify they handle failures gracefully.

The tool targets SREs, platform engineers, and DevOps teams who manage Kubernetes clusters in production. If you need to prove your services survive node failures, network partitions, or resource exhaustion, LitmusChaos gives you repeatable experiments with observable outcomes.

§02

How it saves time or tokens

Manually testing failure scenarios requires writing custom scripts, coordinating team members, and hoping you covered the right cases. LitmusChaos replaces that ad-hoc process with a catalog of pre-built experiments and a ChaosCenter dashboard. You define a chaos scenario once, schedule it, and get automated reports. This turns what was a multi-day manual exercise into a repeatable pipeline step.

§03

How to use

  1. Install LitmusChaos via Helm into your Kubernetes cluster:
helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm
helm install litmus litmuschaos/litmus \
  --namespace litmus --create-namespace
  1. Access the ChaosCenter dashboard to browse available experiments and create chaos scenarios.
  1. Define a ChaosEngine resource targeting your application namespace and select experiments (pod-delete, network-loss, etc.).
  1. Run the experiment and observe results in the ChaosCenter UI or via Kubernetes events.
§04

Example

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: default
spec:
  appinfo:
    appns: default
    applabel: 'app=nginx'
    appkind: deployment
  chaosServiceAccount: litmus-admin
  experiments:
    - name: pod-delete
      spec:
        components:
          env:
            - name: TOTAL_CHAOS_DURATION
              value: '30'
            - name: CHAOS_INTERVAL
              value: '10'

This deletes nginx pods every 10 seconds for 30 seconds, letting you observe how your deployment recovers.

§05

Related on TokRepo

§06

Common pitfalls

  • Running chaos experiments on production without proper blast radius limits can cause real outages. Always set namespace selectors and duration caps.
  • Skipping the steady-state hypothesis means you cannot measure whether the experiment actually proved resilience.
  • Forgetting RBAC permissions for the chaos service account leads to silent experiment failures with no useful feedback.

Frequently Asked Questions

What Kubernetes versions does LitmusChaos support?+

LitmusChaos supports Kubernetes 1.17 and above. It works with managed clusters on EKS, GKE, AKS, and self-managed clusters. Check the official docs for the latest compatibility matrix.

Can I run LitmusChaos in CI/CD pipelines?+

Yes. LitmusChaos experiments can be triggered via kubectl or the LitmusChaos API, making them easy to integrate into CI/CD pipelines as a post-deploy verification step. GitHub Actions and GitLab CI examples are available in the docs.

Is LitmusChaos safe for production use?+

LitmusChaos is designed for production use with proper safeguards. Use namespace selectors, duration limits, and abort conditions to control blast radius. Start with non-critical workloads and expand coverage gradually.

How does LitmusChaos compare to Chaos Monkey?+

Chaos Monkey from Netflix randomly terminates instances. LitmusChaos offers a broader experiment library (network, disk, CPU, DNS) with declarative Kubernetes-native definitions and a visual dashboard for orchestration.

What observability does LitmusChaos provide?+

ChaosCenter provides a web dashboard with experiment history, pass/fail verdicts, and resilience scores. Experiments also emit Kubernetes events and support Prometheus metrics export for integration with Grafana dashboards.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets