Is TRL — Post-Training LLMs with RLHF & DPO free to use?

Yes. TRL — Post-Training LLMs with RLHF & DPO is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install TRL — Post-Training LLMs with RLHF & DPO?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsMar 31, 2026·2 min read

TRL — Post-Training LLMs with RLHF & DPO

Name: TRL — Post-Training LLMs with RLHF & DPO
Author: TokRepo精选

TRL is a Hugging Face library for post-training foundation models. 17.9K+ GitHub stars. SFT, GRPO, DPO, reward modeling. Scales from single GPU to multi-node. Apache 2.0.

TokRepo精选 · Community

Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Install
pip install trl

# Fine-tune with CLI (no code needed)
trl sft --model_name_or_path Qwen/Qwen2.5-0.5B \
    --dataset_name trl-lib/Capybara \
    --output_dir sft-output

# Or in Python
python -c "
from trl import SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-0.5B')
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-0.5B')
trainer = SFTTrainer(model=model, tokenizer=tokenizer, train_dataset=dataset)
trainer.train()
"

Intro

TRL (Transformers Reinforcement Learning) is a Hugging Face library for post-training foundation models using techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), and reward modeling. With 17,900+ GitHub stars and Apache 2.0 license, TRL scales from single GPU to multi-node clusters via Accelerate and DeepSpeed. It includes a CLI for quick fine-tuning without code, and integrates with PEFT for efficient training on large models.

Best for: ML engineers fine-tuning LLMs with human preference data or custom datasets Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Ecosystem: Hugging Face Transformers, Accelerate, DeepSpeed, PEFT

Key Features

Multiple trainers: SFTTrainer, GRPOTrainer, DPOTrainer, RewardTrainer, PPOTrainer
CLI interface: Fine-tune models without writing code
Scalable: Single GPU to multi-node clusters via Accelerate and DeepSpeed
PEFT integration: LoRA and QLoRA for efficient training on large models
Built on Transformers: Full compatibility with Hugging Face ecosystem

FAQ

Q: What is TRL? A: TRL is a Hugging Face library with 17.9K+ stars for post-training LLMs using SFT, DPO, GRPO, and reward modeling. It scales from single GPU to multi-node and includes a no-code CLI. Apache 2.0.

Q: How do I install TRL? A: Run pip install trl. Use the CLI with trl sft --model_name_or_path <model> --dataset_name <dataset> or the Python API with SFTTrainer/DPOTrainer classes.

🙏

Source & Thanks

Created by Hugging Face. Licensed under Apache 2.0. huggingface/trl — 17,900+ GitHub stars

◈Home 🏆Trending 👤Me

TRL — Post-Training LLMs with RLHF & DPO

Use it first, then decide how deep to go

Key Features

FAQ

Source & Thanks

Related Assets

Kokoro — Lightweight 82M TTS in 9 Languages

GPT4All — Run LLMs Privately on Your Desktop

vLLM — High-Throughput LLM Serving Engine