ScriptsMar 31, 2026·2 min read

TRL — Post-Training LLMs with RLHF & DPO

TRL is a Hugging Face library for post-training foundation models. 17.9K+ GitHub stars. SFT, GRPO, DPO, reward modeling. Scales from single GPU to multi-node. Apache 2.0.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Install
pip install trl

# Fine-tune with CLI (no code needed)
trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
    --dataset_name trl-lib/Capybara \
    --output_dir sft-output

# Or in Python
python -c "
from trl import SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-0.5B')
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-0.5B')
trainer = SFTTrainer(model=model, tokenizer=tokenizer, train_dataset=dataset)
trainer.train()
"

Intro

TRL (Transformers Reinforcement Learning) is a Hugging Face library for post-training foundation models using techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), and reward modeling. With 17,900+ GitHub stars and Apache 2.0 license, TRL scales from single GPU to multi-node clusters via Accelerate and DeepSpeed. It includes a CLI for quick fine-tuning without code, and integrates with PEFT for efficient training on large models.

Best for: ML engineers fine-tuning LLMs with human preference data or custom datasets Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Ecosystem: Hugging Face Transformers, Accelerate, DeepSpeed, PEFT


Key Features

  • Multiple trainers: SFTTrainer, GRPOTrainer, DPOTrainer, RewardTrainer, PPOTrainer
  • CLI interface: Fine-tune models without writing code
  • Scalable: Single GPU to multi-node clusters via Accelerate and DeepSpeed
  • PEFT integration: LoRA and QLoRA for efficient training on large models
  • Built on Transformers: Full compatibility with Hugging Face ecosystem

FAQ

Q: What is TRL? A: TRL is a Hugging Face library with 17.9K+ stars for post-training LLMs using SFT, DPO, GRPO, and reward modeling. It scales from single GPU to multi-node and includes a no-code CLI. Apache 2.0.

Q: How do I install TRL? A: Run pip install trl. Use the CLI with trl sft --model_name_or_path <model> --dataset_name <dataset> or the Python API with SFTTrainer/DPOTrainer classes.


🙏

Source & Thanks

Created by Hugging Face. Licensed under Apache 2.0. huggingface/trl — 17,900+ GitHub stars

Related Assets