ScriptsApr 24, 2026·3 min read

LLaMA-Factory — Unified LLM Fine-Tuning Framework

LLaMA-Factory offers a web UI and CLI for fine-tuning over 100 large language models using methods like LoRA, QLoRA, and full-parameter training, with built-in evaluation and export.

Introduction

LLaMA-Factory is a unified framework that makes fine-tuning large language models accessible through both a command-line interface and a browser-based web UI called LlamaBoard. It supports over 100 model architectures and multiple training methods, removing the need to write boilerplate training code.

What LLaMA-Factory Does

  • Provides LoRA, QLoRA, full-parameter, and freeze-tuning methods for any supported model
  • Includes LlamaBoard, a no-code web UI for dataset management, training, and evaluation
  • Supports RLHF, DPO, PPO, and other alignment techniques out of the box
  • Handles multi-GPU and distributed training via DeepSpeed and FSDP
  • Exports fine-tuned models to GGUF, vLLM, and other serving formats

Architecture Overview

LLaMA-Factory wraps Hugging Face Transformers and PEFT libraries, adding a configuration-driven layer that maps YAML files to training pipelines. The core engine resolves model adapters, datasets, and training strategies at runtime, so switching from LoRA to full fine-tuning only requires changing a config key. LlamaBoard communicates with this engine via a local API server.

Self-Hosting & Configuration

  • Install via pip or clone the repository and run from source
  • Configure training jobs using YAML files specifying model, dataset, method, and hyperparameters
  • Datasets can be loaded from local files, Hugging Face Hub, or custom JSON/CSV
  • Set CUDA_VISIBLE_DEVICES to control GPU allocation for multi-GPU setups
  • Use the Docker image for reproducible environments with pre-installed dependencies

Key Features

  • Supports 100+ model families including LLaMA, Mistral, Qwen, Gemma, and Phi
  • Quantized training via QLoRA reduces VRAM to as low as 4 GB for 7B models
  • Built-in evaluation with BLEU, ROUGE, and custom metric callbacks
  • FlashAttention-2 and unsloth integration for faster training throughput
  • Single YAML config covers model selection, data preprocessing, and training loop

Comparison with Similar Tools

  • Axolotl — more config-driven but less visual; no built-in web UI
  • Unsloth — focused on speed optimization; narrower model support
  • TRL — lower-level Hugging Face library; requires more code
  • Ludwig — declarative ML framework; broader than LLMs but less LLM-specific tuning

FAQ

Q: What GPU do I need to fine-tune a 7B model? A: With QLoRA, a single GPU with 4-6 GB VRAM is sufficient. Full fine-tuning requires significantly more memory.

Q: Can I use custom datasets? A: Yes. Place JSON or CSV files in the data directory and register them in dataset_info.json with the appropriate column mappings.

Q: Does it support multi-node training? A: Yes. LLaMA-Factory integrates DeepSpeed ZeRO and PyTorch FSDP for distributed training across multiple nodes.

Q: How do I export a model after training? A: Use the export command or the LlamaBoard export tab to merge adapters and save in Hugging Face, GGUF, or vLLM-ready format.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets