What is torchtune — PyTorch-Native LLM Fine-Tuning Library?

An official PyTorch library for fine-tuning large language models. torchtune provides composable, tested building blocks for LoRA, QLoRA, full fine-tuning, and DPO on models like LLaMA, Mistral, and Gemma.

Is torchtune — PyTorch-Native LLM Fine-Tuning Library free to use?

Yes. torchtune — PyTorch-Native LLM Fine-Tuning Library is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install torchtune — PyTorch-Native LLM Fine-Tuning Library?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

torchtune — PyTorch-Native LLM Fine-Tuning Library

Introduction

torchtune is the official PyTorch library for authoring, fine-tuning, and experimenting with LLMs. It provides a clean, modular codebase with no trainer abstractions, giving users full control over the training loop while handling the complexity of modern fine-tuning methods.

What torchtune Does

Fine-tunes LLMs using LoRA, QLoRA, full parameter tuning, and DoRA
Supports alignment methods including DPO and PPO
Provides recipes for single-GPU and multi-GPU distributed training
Downloads and converts model weights from Hugging Face Hub
Includes dataset utilities for instruction tuning, chat, and preference data

Architecture Overview

torchtune is built around recipes (complete training scripts) and configs (YAML-based hyperparameter files). Model definitions are pure PyTorch nn.Modules with no framework abstractions. LoRA and quantization are applied as composable transforms on the model layers. The library uses PyTorch Distributed for multi-GPU training and integrates with torchao for quantization-aware training.

Self-Hosting & Configuration

Requires Python 3.9+ and PyTorch 2.4+
Install via pip; no custom CUDA compilation needed
YAML configs control model, dataset, optimizer, and training parameters
tune CLI handles downloads, training, evaluation, and quantization
Single consumer GPU (24 GB) sufficient for LoRA on 7B models with QLoRA

Key Features

No hidden trainer class; recipes are readable end-to-end training scripts
Supports LLaMA 2/3, Mistral, Gemma, Phi, and Qwen model families
Memory-efficient training via LoRA, QLoRA, activation checkpointing, and gradient accumulation
Integrated with Weights & Biases and TensorBoard for experiment tracking
Quantization support via torchao for 4-bit and 8-bit fine-tuning

Comparison with Similar Tools

Hugging Face TRL — higher-level trainer API; torchtune gives more control with explicit training loops
Axolotl — config-driven fine-tuning; torchtune uses transparent recipes instead of a monolithic trainer
LLaMA-Factory — broad model and method support; torchtune prioritizes PyTorch-native composability
Unsloth — focuses on inference and training speed hacks; torchtune focuses on correctness and modularity

FAQ

Q: Which models does torchtune support? A: LLaMA 2, LLaMA 3, LLaMA 3.2, Mistral, Gemma, Phi-3, Qwen-2.5, and more. New models are added regularly.

Q: Can I use torchtune for pre-training? A: It is designed for fine-tuning. Pre-training recipes are experimental.

Q: How much VRAM do I need for QLoRA on a 7B model? A: Approximately 10-12 GB, fitting on a single RTX 3080 or RTX 4090.

Q: Does torchtune support multi-node training? A: Yes, via PyTorch Distributed (FSDP). Multi-node recipes are provided.

torchtune — PyTorch-Native LLM Fine-Tuning Library

Introduction

What torchtune Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

xFormers — Flexible and Efficient Transformers Library

FlashAttention — Fast and Memory-Efficient Exact Attention

llm.c — LLM Training in Simple Raw C/CUDA