# LLaMA-Factory — Unified LLM Fine-Tuning Framework

> LLaMA-Factory offers a web UI and CLI for fine-tuning over 100 large language models using methods like LoRA, QLoRA, and full-parameter training, with built-in evaluation and export.

## Install

Save as a script file and run:

# LLaMA-Factory — Unified LLM Fine-Tuning Framework

## Quick Use
```bash
pip install llamafactory
llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml
# Or launch the web UI:
llamafactory-cli webui
```

## Introduction
LLaMA-Factory is a unified framework that makes fine-tuning large language models accessible through both a command-line interface and a browser-based web UI called LlamaBoard. It supports over 100 model architectures and multiple training methods, removing the need to write boilerplate training code.

## What LLaMA-Factory Does
- Provides LoRA, QLoRA, full-parameter, and freeze-tuning methods for any supported model
- Includes LlamaBoard, a no-code web UI for dataset management, training, and evaluation
- Supports RLHF, DPO, PPO, and other alignment techniques out of the box
- Handles multi-GPU and distributed training via DeepSpeed and FSDP
- Exports fine-tuned models to GGUF, vLLM, and other serving formats

## Architecture Overview
LLaMA-Factory wraps Hugging Face Transformers and PEFT libraries, adding a configuration-driven layer that maps YAML files to training pipelines. The core engine resolves model adapters, datasets, and training strategies at runtime, so switching from LoRA to full fine-tuning only requires changing a config key. LlamaBoard communicates with this engine via a local API server.

## Self-Hosting & Configuration
- Install via pip or clone the repository and run from source
- Configure training jobs using YAML files specifying model, dataset, method, and hyperparameters
- Datasets can be loaded from local files, Hugging Face Hub, or custom JSON/CSV
- Set CUDA_VISIBLE_DEVICES to control GPU allocation for multi-GPU setups
- Use the Docker image for reproducible environments with pre-installed dependencies

## Key Features
- Supports 100+ model families including LLaMA, Mistral, Qwen, Gemma, and Phi
- Quantized training via QLoRA reduces VRAM to as low as 4 GB for 7B models
- Built-in evaluation with BLEU, ROUGE, and custom metric callbacks
- FlashAttention-2 and unsloth integration for faster training throughput
- Single YAML config covers model selection, data preprocessing, and training loop

## Comparison with Similar Tools
- **Axolotl** — more config-driven but less visual; no built-in web UI
- **Unsloth** — focused on speed optimization; narrower model support
- **TRL** — lower-level Hugging Face library; requires more code
- **Ludwig** — declarative ML framework; broader than LLMs but less LLM-specific tuning

## FAQ
**Q: What GPU do I need to fine-tune a 7B model?**
A: With QLoRA, a single GPU with 4-6 GB VRAM is sufficient. Full fine-tuning requires significantly more memory.

**Q: Can I use custom datasets?**
A: Yes. Place JSON or CSV files in the data directory and register them in dataset_info.json with the appropriate column mappings.

**Q: Does it support multi-node training?**
A: Yes. LLaMA-Factory integrates DeepSpeed ZeRO and PyTorch FSDP for distributed training across multiple nodes.

**Q: How do I export a model after training?**
A: Use the export command or the LlamaBoard export tab to merge adapters and save in Hugging Face, GGUF, or vLLM-ready format.

## Sources
- https://github.com/hiyouga/LLaMA-Factory
- https://llamafactory.readthedocs.io/

---
Source: https://tokrepo.com/en/workflows/541c701c-3fda-11f1-9bc6-00163e2b0d79
Author: Script Depot