Stable Baselines3 — Reliable Reinforcement Learning Implementations in PyTorch

Introduction

Stable Baselines3 (SB3) is the PyTorch continuation of the Stable Baselines project. It provides clean, tested implementations of common RL algorithms with a unified API, making it straightforward to train agents, compare algorithms, and integrate with Gymnasium environments.

What Stable Baselines3 Does

Implements core RL algorithms: PPO, A2C, SAC, TD3, DQN, HER, and more
Provides a unified model.learn() / model.predict() interface across all algorithms
Supports custom policies, feature extractors, and callback hooks
Handles observation normalization, action clipping, and vectorized environments
Includes experiment logging with TensorBoard, CSV, and Weights & Biases

Architecture Overview

Each algorithm subclasses BaseAlgorithm, which manages the environment, policy network, rollout buffer, and training loop. Policies are nn.Module objects composed of feature extractors and action heads. RolloutBuffer (on-policy) and ReplayBuffer (off-policy) store transitions. SB3 uses Gymnasium as its environment interface and supports VecEnv wrappers for parallel rollout collection. Callbacks hook into training for evaluation, checkpointing, and early stopping.

Self-Hosting & Configuration

Install via pip: pip install stable-baselines3[extra] (includes visualization and contrib algorithms)
Train with a single call: model = PPO('MlpPolicy', 'LunarLander-v3'); model.learn(100000)
Customize the network: pass policy_kwargs=dict(net_arch=[256, 256]) to change hidden layer sizes
Save and load models: model.save('ppo_model') and PPO.load('ppo_model')
Monitor training: set tensorboard_log="./tb_logs/" for TensorBoard integration

Key Features

Clean, readable codebase with high test coverage and type annotations
Consistent API: swap PPO for SAC by changing one class name
SB3-Contrib extends the base with experimental algorithms (TQC, TRPO, RecurrentPPO)
Built-in support for Hindsight Experience Replay (HER) for goal-conditioned tasks
Active community with thorough documentation and RL Zoo for hyperparameter benchmarks

Comparison with Similar Tools

RLlib (Ray) — distributed RL at scale with more algorithms; SB3 is simpler to install and use on a single machine
CleanRL — single-file algorithm implementations for transparency; SB3 provides a reusable library API
Tianshou — modular RL framework with more algorithms; SB3 has a larger user base and more tutorials
Gymnasium — the environment API that SB3 builds on, not an algorithm library
TF-Agents — TensorFlow-based RL; SB3 is PyTorch-native

FAQ

Q: Which algorithm should I start with? A: PPO is a good default for most tasks. Use SAC or TD3 for continuous action spaces, DQN for discrete ones.

Q: Can I use custom Gymnasium environments with SB3? A: Yes. Any environment implementing the Gymnasium Env interface works directly. Use the env checker to validate.

Q: How do I tune hyperparameters? A: The RL Zoo3 companion project provides tuned hyperparameters and an Optuna-based tuning script.

Q: Does SB3 support multi-agent RL? A: Not natively. SB3 targets single-agent environments. For multi-agent, look at PettingZoo with custom wrappers.

Stable Baselines3 — Reliable Reinforcement Learning Implementations in PyTorch

Introduction

What Stable Baselines3 Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discusión

Activos relacionados

Unkey — Open-Source API Key Management Platform

Flagsmith — Open-Source Feature Flags and Remote Config

OpenStatus — Open-Source Monitoring and Status Page Platform