# LLaMA-Factory — Fine-Tune 100+ LLMs with a Unified Interface > LLaMA-Factory provides a web UI and CLI to fine-tune large language models including LLaMA, Mistral, Qwen, and more using LoRA, QLoRA, and full-parameter methods without writing training scripts. ## Install Save as a script file and run: ## Quick Use ```bash pip install llamafactory llamafactory-cli webui ``` ## Introduction LLaMA-Factory is an open-source framework that makes fine-tuning large language models accessible through a unified web interface and command-line tool. It eliminates the need to write custom training loops by providing pre-built pipelines for supervised fine-tuning, RLHF, DPO, and other post-training methods across a wide range of model architectures. ## What LLaMA-Factory Does - Supports fine-tuning of 100+ LLM architectures including LLaMA, Mistral, Qwen, Yi, Gemma, and Phi - Provides a no-code web UI (LLaMA Board) for dataset configuration, training, and evaluation - Implements LoRA, QLoRA, full-parameter, and GaLore training strategies - Handles distributed training via DeepSpeed and FSDP out of the box - Exports fine-tuned models to GGUF, vLLM, and Hugging Face formats ## Architecture Overview LLaMA-Factory wraps Hugging Face Transformers and PEFT into a unified training pipeline. A YAML-based configuration system maps model names to architecture-specific templates, tokenizer settings, and chat formats. The web UI is built with Gradio, and the CLI dispatches to the same backend. Training jobs run through a custom Trainer class that handles LoRA merging, quantization, and checkpoint management. ## Self-Hosting & Configuration - Install via pip or clone the repository and run `pip install -e .` - Launch the web UI with `llamafactory-cli webui` on port 7860 - Configure training via YAML files or interactively through the web UI - Requires PyTorch 2.0+ and a CUDA-capable GPU for training; CPU inference is supported - Model weights are loaded from Hugging Face Hub or local paths ## Key Features - Unified interface across 100+ model families reduces boilerplate - Built-in quantization (4-bit, 8-bit) enables fine-tuning on consumer GPUs - Integrated evaluation with BLEU, ROUGE, and custom metrics - Supports multi-GPU and multi-node distributed training - Active community with frequent updates tracking new model releases ## Comparison with Similar Tools - **Axolotl** — more YAML-driven, less GUI; similar model coverage - **Unsloth** — focuses on inference speed and memory optimization; narrower model support - **TRL** — lower-level library from Hugging Face for RLHF/DPO; requires more code - **FastChat** — emphasizes serving and evaluation; less training flexibility - **AutoTrain** — Hugging Face hosted service; less control over hyperparameters ## FAQ **Q: Can I fine-tune without a GPU?** A: Training requires a CUDA GPU. For CPU-only machines, use the inference and evaluation features with pre-trained or quantized models. **Q: How much VRAM do I need for QLoRA?** A: A 7B model with 4-bit QLoRA typically fits in 6-8 GB VRAM. Larger models scale accordingly. **Q: Does it support multi-turn conversation data?** A: Yes. LLaMA-Factory accepts ShareGPT and Alpaca formats for multi-turn dialogue datasets. **Q: Can I export to GGUF for llama.cpp?** A: Yes. The CLI includes an export command that converts merged checkpoints to GGUF format. ## Sources - https://github.com/hiyouga/LLaMA-Factory - https://llamafactory.readthedocs.io/ --- Source: https://tokrepo.com/en/workflows/692f7269-42b9-11f1-9bc6-00163e2b0d79 Author: Script Depot