# LLaMA-Factory — Unified LLM Fine-Tuning Framework > LLaMA-Factory offers a web UI and CLI for fine-tuning over 100 large language models using methods like LoRA, QLoRA, and full-parameter training, with built-in evaluation and export. ## Install Save as a script file and run: # LLaMA-Factory — Unified LLM Fine-Tuning Framework ## Quick Use ```bash pip install llamafactory llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml # Or launch the web UI: llamafactory-cli webui ``` ## Introduction LLaMA-Factory is a unified framework that makes fine-tuning large language models accessible through both a command-line interface and a browser-based web UI called LlamaBoard. It supports over 100 model architectures and multiple training methods, removing the need to write boilerplate training code. ## What LLaMA-Factory Does - Provides LoRA, QLoRA, full-parameter, and freeze-tuning methods for any supported model - Includes LlamaBoard, a no-code web UI for dataset management, training, and evaluation - Supports RLHF, DPO, PPO, and other alignment techniques out of the box - Handles multi-GPU and distributed training via DeepSpeed and FSDP - Exports fine-tuned models to GGUF, vLLM, and other serving formats ## Architecture Overview LLaMA-Factory wraps Hugging Face Transformers and PEFT libraries, adding a configuration-driven layer that maps YAML files to training pipelines. The core engine resolves model adapters, datasets, and training strategies at runtime, so switching from LoRA to full fine-tuning only requires changing a config key. LlamaBoard communicates with this engine via a local API server. ## Self-Hosting & Configuration - Install via pip or clone the repository and run from source - Configure training jobs using YAML files specifying model, dataset, method, and hyperparameters - Datasets can be loaded from local files, Hugging Face Hub, or custom JSON/CSV - Set CUDA_VISIBLE_DEVICES to control GPU allocation for multi-GPU setups - Use the Docker image for reproducible environments with pre-installed dependencies ## Key Features - Supports 100+ model families including LLaMA, Mistral, Qwen, Gemma, and Phi - Quantized training via QLoRA reduces VRAM to as low as 4 GB for 7B models - Built-in evaluation with BLEU, ROUGE, and custom metric callbacks - FlashAttention-2 and unsloth integration for faster training throughput - Single YAML config covers model selection, data preprocessing, and training loop ## Comparison with Similar Tools - **Axolotl** — more config-driven but less visual; no built-in web UI - **Unsloth** — focused on speed optimization; narrower model support - **TRL** — lower-level Hugging Face library; requires more code - **Ludwig** — declarative ML framework; broader than LLMs but less LLM-specific tuning ## FAQ **Q: What GPU do I need to fine-tune a 7B model?** A: With QLoRA, a single GPU with 4-6 GB VRAM is sufficient. Full fine-tuning requires significantly more memory. **Q: Can I use custom datasets?** A: Yes. Place JSON or CSV files in the data directory and register them in dataset_info.json with the appropriate column mappings. **Q: Does it support multi-node training?** A: Yes. LLaMA-Factory integrates DeepSpeed ZeRO and PyTorch FSDP for distributed training across multiple nodes. **Q: How do I export a model after training?** A: Use the export command or the LlamaBoard export tab to merge adapters and save in Hugging Face, GGUF, or vLLM-ready format. ## Sources - https://github.com/hiyouga/LLaMA-Factory - https://llamafactory.readthedocs.io/ --- Source: https://tokrepo.com/en/workflows/541c701c-3fda-11f1-9bc6-00163e2b0d79 Author: Script Depot