ScriptsApr 15, 2026·3 min read

Chaos Mesh — Cloud-Native Chaos Engineering for Kubernetes

CNCF chaos-engineering platform that injects pod, network, IO, DNS, and kernel faults into Kubernetes clusters via CRDs.

TL;DR
Chaos Mesh lets you inject controlled failures into Kubernetes via CRDs to prove resilience.
§01

What it is

Chaos Mesh is a CNCF incubating project that provides a chaos engineering platform for Kubernetes. It lets platform and SRE teams run controlled, reproducible failure experiments against live clusters by expressing faults as Kubernetes Custom Resource Definitions (CRDs). This means experiments are versioned, scheduled, and gated in CI exactly like any other Kubernetes resource.

Chaos Mesh is suited for DevOps engineers, SREs, and platform teams who need to validate resilience claims before incidents happen in production.

§02

How it saves time or tokens

Without Chaos Mesh, teams write ad-hoc bash scripts or manually kill pods to test resilience. Chaos Mesh replaces that fragile approach with declarative CRDs that can be applied, reverted, and automated in CI pipelines. A network-latency experiment that would take an hour to set up manually can be defined in a single YAML and applied in seconds. The built-in Dashboard provides a visual workflow editor that further reduces setup time for complex multi-step game-day scenarios.

§03

How to use

  1. Install Chaos Mesh via Helm into your cluster:
helm repo add chaos-mesh https://charts.chaos-mesh.org
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace=chaos-mesh --create-namespace --version 2.6.3
  1. Define a chaos experiment as a CRD. For example, inject 500ms network latency into a service:
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: web-latency
  namespace: default
spec:
  action: delay
  mode: all
  selector:
    labelSelectors:
      app: web
  delay:
    latency: '500ms'
    jitter: '50ms'
  duration: '2m'
  1. Apply the experiment with kubectl apply -f latency.yaml and observe your service behavior in the Chaos Mesh Dashboard or your existing monitoring stack.
§04

Example

A multi-step workflow that kills a database pod, then injects network partition, then verifies recovery:

apiVersion: chaos-mesh.org/v1alpha1
kind: Workflow
metadata:
  name: db-resilience-test
spec:
  entry: serial-steps
  templates:
    - name: serial-steps
      templateType: Serial
      children:
        - kill-db
        - network-partition
    - name: kill-db
      templateType: PodChaos
      podChaos:
        action: pod-kill
        mode: one
        selector:
          labelSelectors:
            app: postgres
    - name: network-partition
      templateType: NetworkChaos
      networkChaos:
        action: partition
        mode: all
        selector:
          labelSelectors:
            app: web
        direction: both
        duration: '60s'
§05

Related on TokRepo

§06

Common pitfalls

  • Running chaos experiments in production without proper blast-radius controls. Always use label selectors and namespace scoping to limit the impact.
  • Forgetting to set a duration field, which can leave faults running indefinitely and cause real outages.
  • Not integrating experiments into CI/CD. One-off manual chaos runs provide limited value compared to automated regression chaos tests.

Frequently Asked Questions

What types of faults can Chaos Mesh inject?+

Chaos Mesh supports pod faults (kill, failure, container-kill), network chaos (latency, packet loss, partition, bandwidth throttle), IO faults (read/write latency, errors), DNS chaos, HTTP chaos, clock skew, and kernel-level faults via eBPF. Each fault type is a separate CRD.

Does Chaos Mesh require any special kernel modules?+

Most fault types work without kernel modifications. Kernel-level chaos (like injecting syscall faults) uses eBPF and requires a Linux kernel 4.18 or later. Pod and network chaos work on any standard Kubernetes cluster.

Can I schedule chaos experiments to run automatically?+

Yes. Chaos Mesh provides a Schedule CRD that runs experiments on a cron-like schedule. You can also trigger experiments from CI pipelines by applying CRDs with kubectl, making it easy to run chaos tests on every deployment.

How does Chaos Mesh compare to Litmus Chaos?+

Both are CNCF chaos engineering projects for Kubernetes. Chaos Mesh uses CRDs natively and includes a built-in Dashboard with a visual workflow editor. Litmus uses a hub-based model with pre-built experiment charts. The choice depends on whether you prefer CRD-native workflows or a marketplace of pre-built experiments.

Is Chaos Mesh safe to use in production?+

Chaos Mesh includes safety mechanisms: namespace-scoped permissions, label selectors for targeting, mandatory duration fields, and RBAC integration. However, any chaos tool can cause real impact if misconfigured. Start in staging environments and gradually expand to production with proper guardrails.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets