Introduction
CleanRL provides single-file implementations of RL algorithms where each file contains the complete training loop, network definitions, and logging in one place. This approach prioritizes readability and hackability over abstraction, making it straightforward to understand, modify, and benchmark RL algorithms without navigating complex class hierarchies.
What CleanRL Does
- Implements PPO, DQN, SAC, TD3, A2C, DDPG, and other standard RL algorithms
- Each algorithm is a single self-contained Python file with no hidden base classes
- Provides tracked experiment results with Weights & Biases integration
- Supports Atari, MuJoCo, continuous control, and multi-agent environments
- Includes RLHF implementations for language model alignment
Architecture Overview
Each CleanRL file follows a consistent structure: argument parsing, environment creation, network definition, training loop, and logging. There is no shared base class or abstract trainer. This means modifying an algorithm requires editing only one file, and understanding the code requires reading only that file. Utility dependencies are limited to gymnasium, PyTorch, and optional logging backends.
Self-Hosting & Configuration
- Install via pip:
pip install cleanrlwith optional extras for Atari or MuJoCo - All hyperparameters exposed as CLI arguments with sensible defaults
- Configure logging to Weights & Biases, TensorBoard, or stdout
- Each file is independently runnable without package installation
- GPU usage is automatic when CUDA is available
Key Features
- Single-file design: entire algorithm in one readable script
- Documented hyperparameters matching original paper implementations
- Reproducible results with seeded environments and tracked experiments
- Cloud integration with W&B for experiment comparison
- RLHF implementations (PPO for LLMs) bridging RL and language modeling
Comparison with Similar Tools
- Stable-Baselines3 — object-oriented RL library with reusable components; more abstraction
- RLlib (Ray) — distributed RL framework; much more complex but scales to clusters
- Tianshou — modular RL library; more structured than CleanRL but less transparent
- SpinningUp (OpenAI) — educational RL implementations; CleanRL covers more algorithms
- rl-games — high-performance RL; optimized for speed over readability
FAQ
Q: Why single-file implementations instead of a modular library? A: Modularity helps large projects but hinders understanding. Single files let you read the entire algorithm top-to-bottom without jumping between files.
Q: Are the implementations faithful to the original papers? A: Yes. Each implementation documents which paper it follows and reproduces reported benchmark scores.
Q: Can I use CleanRL for my research paper? A: Yes. Many papers use CleanRL as their baseline implementation. Results are tracked and reproducible.
Q: Does CleanRL support multi-GPU training? A: Some implementations support distributed training, but the primary focus is single-GPU clarity and correctness.