# TRL — Post-Training LLMs with RLHF & DPO

> TRL is a Hugging Face library for post-training foundation models. 17.9K+ GitHub stars. SFT, GRPO, DPO, reward modeling. Scales from single GPU to multi-node. Apache 2.0.

## Install

Save as a script file and run:

## Quick Use

```bash
# Install
pip install trl

# Fine-tune with CLI (no code needed)
trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
    --dataset_name trl-lib/Capybara \
    --output_dir sft-output

# Or in Python
python -c "
from trl import SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-0.5B')
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-0.5B')
trainer = SFTTrainer(model=model, tokenizer=tokenizer, train_dataset=dataset)
trainer.train()
"
```

---

## Intro

TRL (Transformers Reinforcement Learning) is a Hugging Face library for post-training foundation models using techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), and reward modeling. With 17,900+ GitHub stars and Apache 2.0 license, TRL scales from single GPU to multi-node clusters via Accelerate and DeepSpeed. It includes a CLI for quick fine-tuning without code, and integrates with PEFT for efficient training on large models.

**Best for**: ML engineers fine-tuning LLMs with human preference data or custom datasets
**Works with**: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf
**Ecosystem**: Hugging Face Transformers, Accelerate, DeepSpeed, PEFT

---

## Key Features

- **Multiple trainers**: SFTTrainer, GRPOTrainer, DPOTrainer, RewardTrainer, PPOTrainer
- **CLI interface**: Fine-tune models without writing code
- **Scalable**: Single GPU to multi-node clusters via Accelerate and DeepSpeed
- **PEFT integration**: LoRA and QLoRA for efficient training on large models
- **Built on Transformers**: Full compatibility with Hugging Face ecosystem

---

### FAQ

**Q: What is TRL?**
A: TRL is a Hugging Face library with 17.9K+ stars for post-training LLMs using SFT, DPO, GRPO, and reward modeling. It scales from single GPU to multi-node and includes a no-code CLI. Apache 2.0.

**Q: How do I install TRL?**
A: Run `pip install trl`. Use the CLI with `trl sft --model_name_or_path <model> --dataset_name <dataset>` or the Python API with SFTTrainer/DPOTrainer classes.

---

## Source & Thanks

> Created by [Hugging Face](https://github.com/huggingface). Licensed under Apache 2.0.
> [huggingface/trl](https://github.com/huggingface/trl) — 17,900+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/7989ba1c-3daf-4fd1-bf5b-f1b16b6ca990
Author: Script Depot