Scripts2026年4月16日·1 分钟阅读

LitmusChaos — Cloud-Native Chaos Engineering for Kubernetes

Inject controlled failures into your Kubernetes workloads to test resilience. A CNCF incubating project with a library of 50+ chaos experiments.

Introduction

LitmusChaos is a CNCF incubating project that brings chaos engineering to Kubernetes. It provides a framework for running controlled failure experiments, pod kills, network delays, CPU stress, and more, so teams can verify that their applications recover gracefully under adverse conditions.

What LitmusChaos Does

  • Runs chaos experiments as Kubernetes CRDs with a declarative YAML workflow
  • Offers a ChaosHub with 50+ prebuilt experiments for pods, nodes, and infrastructure
  • Provides a web-based ChaosCenter for designing, scheduling, and observing experiments
  • Supports steady-state hypothesis checks to validate resilience automatically
  • Integrates with CI/CD pipelines to run chaos tests as part of deployment validation

Architecture Overview

LitmusChaos consists of a control plane (ChaosCenter) and an execution plane. ChaosCenter is a web application backed by MongoDB that manages experiment definitions and schedules. The execution plane runs in each target cluster as a set of operators: the Chaos Operator watches ChaosEngine CRDs and launches experiment pods that inject the specified failure. Results are reported back to ChaosCenter for analysis and visualization.

Self-Hosting & Configuration

  • Deploy ChaosCenter via Helm chart or kubectl manifests into a management cluster
  • Register target clusters as Chaos Delegates through the ChaosCenter UI
  • Browse the ChaosHub to select and customize experiments
  • Define ChaosWorkflows combining multiple experiments with steady-state checks
  • Schedule recurring chaos tests via cron expressions in the workflow definition

Key Features

  • CNCF incubating project with an active community and vendor-neutral governance
  • 50+ prebuilt experiments covering pod, node, network, DNS, and cloud provider faults
  • GitOps-native experiment management with version-controlled workflow definitions
  • Observability integration with Prometheus metrics and Grafana dashboards
  • Multi-cluster chaos orchestration from a single ChaosCenter instance

Comparison with Similar Tools

  • Chaos Mesh — CNCF project with similar Kubernetes-native chaos; LitmusChaos offers a richer web UI and ChaosHub marketplace
  • Gremlin — Commercial SaaS chaos platform; LitmusChaos is fully open-source and self-hosted
  • AWS Fault Injection Simulator — AWS-only managed service; LitmusChaos works on any Kubernetes cluster
  • Pumba — Docker-level chaos tool; LitmusChaos operates at the Kubernetes abstraction layer with CRD-driven workflows

FAQ

Q: Can LitmusChaos cause production outages? A: Experiments are scoped by namespace, labels, and blast radius controls. Start with non-production clusters and narrow targeting to reduce risk.

Q: Does it require ChaosCenter to run experiments? A: No. You can run experiments directly via ChaosEngine CRDs and kubectl without ChaosCenter, though the UI simplifies workflow management.

Q: How do I create a custom chaos experiment? A: Write a Go or shell-based experiment, package it as a container image, and register it in a custom ChaosHub or inline in your workflow.

Q: What steady-state hypothesis checks are supported? A: Built-in probes support HTTP endpoints, command output, Kubernetes resource conditions, and Prometheus queries.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产