# Stable Baselines3 — Reliable Reinforcement Learning Implementations in PyTorch > Stable Baselines3 is a set of reliable, well-tested reinforcement learning algorithm implementations in PyTorch, designed as a plug-and-play toolkit for RL research and applications. ## Install Save as a script file and run: # Stable Baselines3 — Reliable Reinforcement Learning Implementations in PyTorch ## Quick Use ```bash pip install stable-baselines3[extra] python -c " from stable_baselines3 import PPO model = PPO('MlpPolicy', 'CartPole-v1', verbose=0) model.learn(total_timesteps=10000) obs, _ = model.get_env().reset() action, _ = model.predict(obs) print('Trained PPO action:', action) " ``` ## Introduction Stable Baselines3 (SB3) is the PyTorch continuation of the Stable Baselines project. It provides clean, tested implementations of common RL algorithms with a unified API, making it straightforward to train agents, compare algorithms, and integrate with Gymnasium environments. ## What Stable Baselines3 Does - Implements core RL algorithms: PPO, A2C, SAC, TD3, DQN, HER, and more - Provides a unified `model.learn() / model.predict()` interface across all algorithms - Supports custom policies, feature extractors, and callback hooks - Handles observation normalization, action clipping, and vectorized environments - Includes experiment logging with TensorBoard, CSV, and Weights & Biases ## Architecture Overview Each algorithm subclasses `BaseAlgorithm`, which manages the environment, policy network, rollout buffer, and training loop. Policies are `nn.Module` objects composed of feature extractors and action heads. `RolloutBuffer` (on-policy) and `ReplayBuffer` (off-policy) store transitions. SB3 uses Gymnasium as its environment interface and supports `VecEnv` wrappers for parallel rollout collection. Callbacks hook into training for evaluation, checkpointing, and early stopping. ## Self-Hosting & Configuration - Install via pip: `pip install stable-baselines3[extra]` (includes visualization and contrib algorithms) - Train with a single call: `model = PPO('MlpPolicy', 'LunarLander-v3'); model.learn(100000)` - Customize the network: pass `policy_kwargs=dict(net_arch=[256, 256])` to change hidden layer sizes - Save and load models: `model.save('ppo_model')` and `PPO.load('ppo_model')` - Monitor training: set `tensorboard_log="./tb_logs/"` for TensorBoard integration ## Key Features - Clean, readable codebase with high test coverage and type annotations - Consistent API: swap PPO for SAC by changing one class name - SB3-Contrib extends the base with experimental algorithms (TQC, TRPO, RecurrentPPO) - Built-in support for Hindsight Experience Replay (HER) for goal-conditioned tasks - Active community with thorough documentation and RL Zoo for hyperparameter benchmarks ## Comparison with Similar Tools - **RLlib (Ray)** — distributed RL at scale with more algorithms; SB3 is simpler to install and use on a single machine - **CleanRL** — single-file algorithm implementations for transparency; SB3 provides a reusable library API - **Tianshou** — modular RL framework with more algorithms; SB3 has a larger user base and more tutorials - **Gymnasium** — the environment API that SB3 builds on, not an algorithm library - **TF-Agents** — TensorFlow-based RL; SB3 is PyTorch-native ## FAQ **Q: Which algorithm should I start with?** A: PPO is a good default for most tasks. Use SAC or TD3 for continuous action spaces, DQN for discrete ones. **Q: Can I use custom Gymnasium environments with SB3?** A: Yes. Any environment implementing the Gymnasium `Env` interface works directly. Use the env checker to validate. **Q: How do I tune hyperparameters?** A: The RL Zoo3 companion project provides tuned hyperparameters and an Optuna-based tuning script. **Q: Does SB3 support multi-agent RL?** A: Not natively. SB3 targets single-agent environments. For multi-agent, look at PettingZoo with custom wrappers. ## Sources - https://github.com/DLR-RM/stable-baselines3 - https://stable-baselines3.readthedocs.io/ --- Source: https://tokrepo.com/en/workflows/75b471b3-3e26-11f1-9bc6-00163e2b0d79 Author: Script Depot