Introduction
ChaosBlade is an open-source chaos engineering toolkit created by Alibaba that helps teams verify the resilience of distributed systems. It provides a unified CLI and Kubernetes operator for injecting faults at the OS, container, pod, and application layer without writing custom failure scripts.
What ChaosBlade Does
- Injects network faults (delay, loss, corruption, partition) at the OS and container level
- Simulates CPU, memory, disk, and process failures on bare-metal and virtual hosts
- Targets Kubernetes pods, nodes, and containers through a CRD-based operator
- Injects application-level faults into JVM processes (method delay, exception throw, thread pool exhaust)
- Provides a destroy command to cleanly roll back any active experiment
Architecture Overview
ChaosBlade is built on a plugin model. The core CLI (blade) dispatches experiment commands to executors specific to each target: os-executor for host-level faults, docker-executor for containers, and the chaosblade-operator for Kubernetes resources. JVM faults use a Java agent attached to the target process. Each experiment is tracked by a unique ID so it can be queried or destroyed independently. The Kubernetes operator watches ChaosBlade CRDs and translates them into targeted fault injections on the selected pods.
Self-Hosting & Configuration
- Download the prebuilt binary for Linux or macOS from the GitHub releases page
- Deploy the chaosblade-operator via Helm for Kubernetes chaos experiments
- Define experiments in YAML CRDs specifying target scope, fault type, and duration
- Use the blade CLI directly for ad-hoc host or container experiments without Kubernetes
- Integrate with the ChaosBlade Box web platform for visual experiment orchestration and scheduling
Key Features
- Unified experiment model covers hosts, Docker containers, Kubernetes pods, and JVM applications with the same CLI syntax
- Atomic experiment design ensures every fault has a matching destroy command for safe rollback
- Kubernetes label selectors and namespace scoping limit blast radius to specific pods or services
- JVM sandbox engine injects faults at the bytecode level without restarting the target application
- Experiment history and status tracking via the blade status command for audit and debugging
Comparison with Similar Tools
- Chaos Mesh — CNCF project with a web dashboard; ChaosBlade offers broader target coverage including JVM and host-level faults from a single CLI
- Litmus — Kubernetes-native chaos with ChaosHub experiment library; ChaosBlade provides a simpler CLI-first experience with less Kubernetes overhead
- Gremlin — commercial SaaS chaos platform; ChaosBlade is fully open-source and self-hosted
- Pumba — Docker-specific chaos tool; ChaosBlade supports Docker plus Kubernetes, hosts, and JVM targets
- Toxiproxy — network fault proxy for testing; ChaosBlade injects faults at the kernel level without proxying traffic
FAQ
Q: Is ChaosBlade safe to run in production? A: Yes, with precautions. Every experiment has a destroy command. Use Kubernetes label selectors to limit scope, and start with non-critical services.
Q: Does ChaosBlade require root access? A: Most OS-level experiments (network, disk, CPU) require root or equivalent privileges. JVM experiments can run as the application user.
Q: Can I schedule recurring chaos experiments? A: The ChaosBlade Box web platform supports scheduled experiments. The CLI itself is stateless and can be triggered by cron or CI pipelines.
Q: What languages does ChaosBlade support for application-level faults? A: JVM-based languages (Java, Kotlin, Scala) are natively supported via the Java agent. C++ applications can be targeted via the cplus executor.