# LLaMA-Factory — Fine-Tune 100+ LLMs with a Unified Interface

> LLaMA-Factory provides a web UI and CLI to fine-tune large language models including LLaMA, Mistral, Qwen, and more using LoRA, QLoRA, and full-parameter methods without writing training scripts.

## Install

Save as a script file and run:

## Quick Use
```bash
pip install llamafactory
llamafactory-cli webui
```

## Introduction
LLaMA-Factory is an open-source framework that makes fine-tuning large language models accessible through a unified web interface and command-line tool. It eliminates the need to write custom training loops by providing pre-built pipelines for supervised fine-tuning, RLHF, DPO, and other post-training methods across a wide range of model architectures.

## What LLaMA-Factory Does
- Supports fine-tuning of 100+ LLM architectures including LLaMA, Mistral, Qwen, Yi, Gemma, and Phi
- Provides a no-code web UI (LLaMA Board) for dataset configuration, training, and evaluation
- Implements LoRA, QLoRA, full-parameter, and GaLore training strategies
- Handles distributed training via DeepSpeed and FSDP out of the box
- Exports fine-tuned models to GGUF, vLLM, and Hugging Face formats

## Architecture Overview
LLaMA-Factory wraps Hugging Face Transformers and PEFT into a unified training pipeline. A YAML-based configuration system maps model names to architecture-specific templates, tokenizer settings, and chat formats. The web UI is built with Gradio, and the CLI dispatches to the same backend. Training jobs run through a custom Trainer class that handles LoRA merging, quantization, and checkpoint management.

## Self-Hosting & Configuration
- Install via pip or clone the repository and run `pip install -e .`
- Launch the web UI with `llamafactory-cli webui` on port 7860
- Configure training via YAML files or interactively through the web UI
- Requires PyTorch 2.0+ and a CUDA-capable GPU for training; CPU inference is supported
- Model weights are loaded from Hugging Face Hub or local paths

## Key Features
- Unified interface across 100+ model families reduces boilerplate
- Built-in quantization (4-bit, 8-bit) enables fine-tuning on consumer GPUs
- Integrated evaluation with BLEU, ROUGE, and custom metrics
- Supports multi-GPU and multi-node distributed training
- Active community with frequent updates tracking new model releases

## Comparison with Similar Tools
- **Axolotl** — more YAML-driven, less GUI; similar model coverage
- **Unsloth** — focuses on inference speed and memory optimization; narrower model support
- **TRL** — lower-level library from Hugging Face for RLHF/DPO; requires more code
- **FastChat** — emphasizes serving and evaluation; less training flexibility
- **AutoTrain** — Hugging Face hosted service; less control over hyperparameters

## FAQ
**Q: Can I fine-tune without a GPU?**
A: Training requires a CUDA GPU. For CPU-only machines, use the inference and evaluation features with pre-trained or quantized models.

**Q: How much VRAM do I need for QLoRA?**
A: A 7B model with 4-bit QLoRA typically fits in 6-8 GB VRAM. Larger models scale accordingly.

**Q: Does it support multi-turn conversation data?**
A: Yes. LLaMA-Factory accepts ShareGPT and Alpaca formats for multi-turn dialogue datasets.

**Q: Can I export to GGUF for llama.cpp?**
A: Yes. The CLI includes an export command that converts merged checkpoints to GGUF format.

## Sources
- https://github.com/hiyouga/LLaMA-Factory
- https://llamafactory.readthedocs.io/

---
Source: https://tokrepo.com/en/workflows/692f7269-42b9-11f1-9bc6-00163e2b0d79
Author: Script Depot