# Agent Lightning — Reinforcement Training for AI Agents > Open-source framework by Microsoft for training and evaluating AI agents with reinforcement learning, enabling self-improving agentic systems at scale. ## Install Save as a script file and run: # Agent Lightning — Reinforcement Training for AI Agents ## Quick Use ```bash pip install agent-lightning agent-lightning train --config examples/basic.yaml ``` ## Introduction Agent Lightning is an open-source framework from Microsoft designed to train AI agents using reinforcement learning. It provides a structured pipeline for reward modeling, policy optimization, and evaluation so teams can build agents that improve autonomously through interaction feedback. ## What Agent Lightning Does - Trains agentic LLMs with RLHF and DPO-style reward signals - Provides environment abstractions for multi-step task execution - Supports distributed training across GPU clusters - Integrates with popular model backends (Hugging Face, vLLM) - Offers evaluation harnesses for measuring agent capability over time ## Architecture Overview Agent Lightning follows a modular trainer-environment-evaluator architecture. The trainer orchestrates policy updates using configurable reward models, while environments expose step-based interfaces for tool use, code execution, or API interaction. Checkpoints and metrics flow through a central experiment tracker compatible with MLflow and Weights & Biases. ## Self-Hosting & Configuration - Install via pip or clone the repository for development - Define training configs in YAML (model, environment, reward) - Requires CUDA-compatible GPUs for training workloads - Supports multi-node setups via PyTorch distributed or Ray - Environment variables control logging, checkpointing, and WandB integration ## Key Features - Modular reward model architecture supporting custom scoring - Built-in environments for code generation, web browsing, and tool use - Scales from single-GPU experimentation to multi-node clusters - Compatible with LoRA and QLoRA for efficient fine-tuning - Tracks training runs with structured metrics and replay buffers ## Comparison with Similar Tools - **TRL (Hugging Face)** — focuses on single-turn RLHF; Agent Lightning targets multi-step agentic loops - **OpenRLHF** — strong on raw RLHF but lacks environment abstractions - **Axolotl** — supervised fine-tuning oriented; no RL training loop - **DeepSpeed-Chat** — lower-level; requires more manual orchestration ## FAQ **Q: Does Agent Lightning require a custom reward model?** A: No. It ships with default reward heuristics and supports plugging in external reward APIs or learned reward models. **Q: Can I train on a single GPU?** A: Yes, with smaller models and LoRA. Multi-GPU is recommended for full fine-tuning of 7B+ parameter models. **Q: Which base models are supported?** A: Any Hugging Face-compatible causal LM, including Llama, Mistral, Qwen, and DeepSeek families. **Q: Is it production-ready?** A: The framework is under active development. Microsoft uses it internally for agent research and releases updates regularly. ## Sources - https://github.com/microsoft/agent-lightning - https://github.com/microsoft/agent-lightning/blob/main/README.md --- Source: https://tokrepo.com/en/workflows/asset-79d160b7 Author: Script Depot