Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 2, 2026·3 min de lecture

torchtune — PyTorch-Native LLM Fine-Tuning Library

An official PyTorch library for fine-tuning large language models. torchtune provides composable, tested building blocks for LoRA, QLoRA, full fine-tuning, and DPO on models like LLaMA, Mistral, and Gemma.

Introduction

torchtune is the official PyTorch library for authoring, fine-tuning, and experimenting with LLMs. It provides a clean, modular codebase with no trainer abstractions, giving users full control over the training loop while handling the complexity of modern fine-tuning methods.

What torchtune Does

  • Fine-tunes LLMs using LoRA, QLoRA, full parameter tuning, and DoRA
  • Supports alignment methods including DPO and PPO
  • Provides recipes for single-GPU and multi-GPU distributed training
  • Downloads and converts model weights from Hugging Face Hub
  • Includes dataset utilities for instruction tuning, chat, and preference data

Architecture Overview

torchtune is built around recipes (complete training scripts) and configs (YAML-based hyperparameter files). Model definitions are pure PyTorch nn.Modules with no framework abstractions. LoRA and quantization are applied as composable transforms on the model layers. The library uses PyTorch Distributed for multi-GPU training and integrates with torchao for quantization-aware training.

Self-Hosting & Configuration

  • Requires Python 3.9+ and PyTorch 2.4+
  • Install via pip; no custom CUDA compilation needed
  • YAML configs control model, dataset, optimizer, and training parameters
  • tune CLI handles downloads, training, evaluation, and quantization
  • Single consumer GPU (24 GB) sufficient for LoRA on 7B models with QLoRA

Key Features

  • No hidden trainer class; recipes are readable end-to-end training scripts
  • Supports LLaMA 2/3, Mistral, Gemma, Phi, and Qwen model families
  • Memory-efficient training via LoRA, QLoRA, activation checkpointing, and gradient accumulation
  • Integrated with Weights & Biases and TensorBoard for experiment tracking
  • Quantization support via torchao for 4-bit and 8-bit fine-tuning

Comparison with Similar Tools

  • Hugging Face TRL — higher-level trainer API; torchtune gives more control with explicit training loops
  • Axolotl — config-driven fine-tuning; torchtune uses transparent recipes instead of a monolithic trainer
  • LLaMA-Factory — broad model and method support; torchtune prioritizes PyTorch-native composability
  • Unsloth — focuses on inference and training speed hacks; torchtune focuses on correctness and modularity

FAQ

Q: Which models does torchtune support? A: LLaMA 2, LLaMA 3, LLaMA 3.2, Mistral, Gemma, Phi-3, Qwen-2.5, and more. New models are added regularly.

Q: Can I use torchtune for pre-training? A: It is designed for fine-tuning. Pre-training recipes are experimental.

Q: How much VRAM do I need for QLoRA on a 7B model? A: Approximately 10-12 GB, fitting on a single RTX 3080 or RTX 4090.

Q: Does torchtune support multi-node training? A: Yes, via PyTorch Distributed (FSDP). Multi-node recipes are provided.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires