How do I install LLaMA-Factory — Fine-Tune 100+ LLMs with a Unified Interface?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

LLaMA-Factory — Fine-Tune 100+ LLMs with a Unified Interface

Introduction

LLaMA-Factory is an open-source framework that makes fine-tuning large language models accessible through a unified web interface and command-line tool. It eliminates the need to write custom training loops by providing pre-built pipelines for supervised fine-tuning, RLHF, DPO, and other post-training methods across a wide range of model architectures.

What LLaMA-Factory Does

Supports fine-tuning of 100+ LLM architectures including LLaMA, Mistral, Qwen, Yi, Gemma, and Phi
Provides a no-code web UI (LLaMA Board) for dataset configuration, training, and evaluation
Implements LoRA, QLoRA, full-parameter, and GaLore training strategies
Handles distributed training via DeepSpeed and FSDP out of the box
Exports fine-tuned models to GGUF, vLLM, and Hugging Face formats

Architecture Overview

LLaMA-Factory wraps Hugging Face Transformers and PEFT into a unified training pipeline. A YAML-based configuration system maps model names to architecture-specific templates, tokenizer settings, and chat formats. The web UI is built with Gradio, and the CLI dispatches to the same backend. Training jobs run through a custom Trainer class that handles LoRA merging, quantization, and checkpoint management.

Self-Hosting & Configuration

Install via pip or clone the repository and run pip install -e .
Launch the web UI with llamafactory-cli webui on port 7860
Configure training via YAML files or interactively through the web UI
Requires PyTorch 2.0+ and a CUDA-capable GPU for training; CPU inference is supported
Model weights are loaded from Hugging Face Hub or local paths

Key Features

Unified interface across 100+ model families reduces boilerplate
Built-in quantization (4-bit, 8-bit) enables fine-tuning on consumer GPUs
Integrated evaluation with BLEU, ROUGE, and custom metrics
Supports multi-GPU and multi-node distributed training
Active community with frequent updates tracking new model releases

Comparison with Similar Tools

Axolotl — more YAML-driven, less GUI; similar model coverage
Unsloth — focuses on inference speed and memory optimization; narrower model support
TRL — lower-level library from Hugging Face for RLHF/DPO; requires more code
FastChat — emphasizes serving and evaluation; less training flexibility
AutoTrain — Hugging Face hosted service; less control over hyperparameters

FAQ

Q: Can I fine-tune without a GPU? A: Training requires a CUDA GPU. For CPU-only machines, use the inference and evaluation features with pre-trained or quantized models.

Q: How much VRAM do I need for QLoRA? A: A 7B model with 4-bit QLoRA typically fits in 6-8 GB VRAM. Larger models scale accordingly.

Q: Does it support multi-turn conversation data? A: Yes. LLaMA-Factory accepts ShareGPT and Alpaca formats for multi-turn dialogue datasets.

Q: Can I export to GGUF for llama.cpp? A: Yes. The CLI includes an export command that converts merged checkpoints to GGUF format.

LLaMA-Factory — Fine-Tune 100+ LLMs with a Unified Interface

Introduction

What LLaMA-Factory Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Kornia — Differentiable Computer Vision Library for PyTorch

AlphaFold — AI-Powered 3D Protein Structure Prediction

Flash Attention — Fast Memory-Efficient Exact Attention for Transformers