Introduction
LLaMA-Factory is a unified framework that makes fine-tuning large language models accessible through both a command-line interface and a browser-based web UI called LlamaBoard. It supports over 100 model architectures and multiple training methods, removing the need to write boilerplate training code.
What LLaMA-Factory Does
- Provides LoRA, QLoRA, full-parameter, and freeze-tuning methods for any supported model
- Includes LlamaBoard, a no-code web UI for dataset management, training, and evaluation
- Supports RLHF, DPO, PPO, and other alignment techniques out of the box
- Handles multi-GPU and distributed training via DeepSpeed and FSDP
- Exports fine-tuned models to GGUF, vLLM, and other serving formats
Architecture Overview
LLaMA-Factory wraps Hugging Face Transformers and PEFT libraries, adding a configuration-driven layer that maps YAML files to training pipelines. The core engine resolves model adapters, datasets, and training strategies at runtime, so switching from LoRA to full fine-tuning only requires changing a config key. LlamaBoard communicates with this engine via a local API server.
Self-Hosting & Configuration
- Install via pip or clone the repository and run from source
- Configure training jobs using YAML files specifying model, dataset, method, and hyperparameters
- Datasets can be loaded from local files, Hugging Face Hub, or custom JSON/CSV
- Set CUDA_VISIBLE_DEVICES to control GPU allocation for multi-GPU setups
- Use the Docker image for reproducible environments with pre-installed dependencies
Key Features
- Supports 100+ model families including LLaMA, Mistral, Qwen, Gemma, and Phi
- Quantized training via QLoRA reduces VRAM to as low as 4 GB for 7B models
- Built-in evaluation with BLEU, ROUGE, and custom metric callbacks
- FlashAttention-2 and unsloth integration for faster training throughput
- Single YAML config covers model selection, data preprocessing, and training loop
Comparison with Similar Tools
- Axolotl — more config-driven but less visual; no built-in web UI
- Unsloth — focused on speed optimization; narrower model support
- TRL — lower-level Hugging Face library; requires more code
- Ludwig — declarative ML framework; broader than LLMs but less LLM-specific tuning
FAQ
Q: What GPU do I need to fine-tune a 7B model? A: With QLoRA, a single GPU with 4-6 GB VRAM is sufficient. Full fine-tuning requires significantly more memory.
Q: Can I use custom datasets? A: Yes. Place JSON or CSV files in the data directory and register them in dataset_info.json with the appropriate column mappings.
Q: Does it support multi-node training? A: Yes. LLaMA-Factory integrates DeepSpeed ZeRO and PyTorch FSDP for distributed training across multiple nodes.
Q: How do I export a model after training? A: Use the export command or the LlamaBoard export tab to merge adapters and save in Hugging Face, GGUF, or vLLM-ready format.