# Stable Baselines3 — Reliable Reinforcement Learning Implementations in PyTorch

> Stable Baselines3 is a set of reliable, well-tested reinforcement learning algorithm implementations in PyTorch, designed as a plug-and-play toolkit for RL research and applications.

## Install

Save as a script file and run:

# Stable Baselines3 — Reliable Reinforcement Learning Implementations in PyTorch

## Quick Use
```bash
pip install stable-baselines3[extra]
python -c "
from stable_baselines3 import PPO
model = PPO('MlpPolicy', 'CartPole-v1', verbose=0)
model.learn(total_timesteps=10000)
obs, _ = model.get_env().reset()
action, _ = model.predict(obs)
print('Trained PPO action:', action)
"
```

## Introduction
Stable Baselines3 (SB3) is the PyTorch continuation of the Stable Baselines project. It provides clean, tested implementations of common RL algorithms with a unified API, making it straightforward to train agents, compare algorithms, and integrate with Gymnasium environments.

## What Stable Baselines3 Does
- Implements core RL algorithms: PPO, A2C, SAC, TD3, DQN, HER, and more
- Provides a unified `model.learn() / model.predict()` interface across all algorithms
- Supports custom policies, feature extractors, and callback hooks
- Handles observation normalization, action clipping, and vectorized environments
- Includes experiment logging with TensorBoard, CSV, and Weights & Biases

## Architecture Overview
Each algorithm subclasses `BaseAlgorithm`, which manages the environment, policy network, rollout buffer, and training loop. Policies are `nn.Module` objects composed of feature extractors and action heads. `RolloutBuffer` (on-policy) and `ReplayBuffer` (off-policy) store transitions. SB3 uses Gymnasium as its environment interface and supports `VecEnv` wrappers for parallel rollout collection. Callbacks hook into training for evaluation, checkpointing, and early stopping.

## Self-Hosting & Configuration
- Install via pip: `pip install stable-baselines3[extra]` (includes visualization and contrib algorithms)
- Train with a single call: `model = PPO('MlpPolicy', 'LunarLander-v3'); model.learn(100000)`
- Customize the network: pass `policy_kwargs=dict(net_arch=[256, 256])` to change hidden layer sizes
- Save and load models: `model.save('ppo_model')` and `PPO.load('ppo_model')`
- Monitor training: set `tensorboard_log="./tb_logs/"` for TensorBoard integration

## Key Features
- Clean, readable codebase with high test coverage and type annotations
- Consistent API: swap PPO for SAC by changing one class name
- SB3-Contrib extends the base with experimental algorithms (TQC, TRPO, RecurrentPPO)
- Built-in support for Hindsight Experience Replay (HER) for goal-conditioned tasks
- Active community with thorough documentation and RL Zoo for hyperparameter benchmarks

## Comparison with Similar Tools
- **RLlib (Ray)** — distributed RL at scale with more algorithms; SB3 is simpler to install and use on a single machine
- **CleanRL** — single-file algorithm implementations for transparency; SB3 provides a reusable library API
- **Tianshou** — modular RL framework with more algorithms; SB3 has a larger user base and more tutorials
- **Gymnasium** — the environment API that SB3 builds on, not an algorithm library
- **TF-Agents** — TensorFlow-based RL; SB3 is PyTorch-native

## FAQ
**Q: Which algorithm should I start with?**
A: PPO is a good default for most tasks. Use SAC or TD3 for continuous action spaces, DQN for discrete ones.

**Q: Can I use custom Gymnasium environments with SB3?**
A: Yes. Any environment implementing the Gymnasium `Env` interface works directly. Use the env checker to validate.

**Q: How do I tune hyperparameters?**
A: The RL Zoo3 companion project provides tuned hyperparameters and an Optuna-based tuning script.

**Q: Does SB3 support multi-agent RL?**
A: Not natively. SB3 targets single-agent environments. For multi-agent, look at PettingZoo with custom wrappers.

## Sources
- https://github.com/DLR-RM/stable-baselines3
- https://stable-baselines3.readthedocs.io/

---
Source: https://tokrepo.com/en/workflows/75b471b3-3e26-11f1-9bc6-00163e2b0d79
Author: Script Depot