Axolotl — Streamlined LLM Fine-Tuning
Axolotl streamlines post-training and fine-tuning for LLMs. 11.6K+ GitHub stars. LoRA, QLoRA, DPO, GRPO, multimodal training. Single YAML config. Flash Attention, multi-GPU. Apache 2.0.
What it is
Axolotl is a framework that streamlines post-training and fine-tuning for large language models. It supports LoRA, QLoRA, full fine-tuning, DPO, GRPO, and multimodal training. You define your entire training run in a single YAML configuration file, and Axolotl handles data loading, tokenization, training, and evaluation.
Axolotl targets ML engineers who want to fine-tune foundation models without writing custom training scripts for each experiment.
How it saves time or tokens
Axolotl eliminates boilerplate training code. Instead of writing data loaders, configuring optimizers, and managing device placement, you specify parameters in YAML and run one command. Switching from LoRA to QLoRA to full fine-tuning is a config change, not a code rewrite.
Flash Attention integration and multi-GPU support via DeepSpeed mean training runs faster without manual optimization.
How to use
- Install Axolotl:
pip install axolotl - Copy an example YAML config for your model and training method
- Customize the config with your dataset, model, and hyperparameters
- Run training:
accelerate launch -m axolotl.cli.train config.yml
Example
# config.yml
base_model: meta-llama/Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_4bit: true
adapter: qlora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
datasets:
- path: tatsu-lab/alpaca
type: alpaca
sequence_len: 4096
micro_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 0.0002
num_epochs: 3
optimizer: adamw_bnb_8bit
flash_attention: true
wandb_project: my-finetune
Run with: accelerate launch -m axolotl.cli.train config.yml
Related on TokRepo
- Coding tools -- ML development tools
- Research tools -- AI research frameworks
Common pitfalls
- QLoRA with 4-bit quantization requires bitsandbytes, which only works on NVIDIA GPUs; Apple Silicon and AMD users need different quantization methods
- Incorrect sequence_len causes OOM or truncated training data; check your dataset's token length distribution first
- Flash Attention requires a compatible GPU (Ampere or newer); disable it on older hardware to avoid crashes
Frequently Asked Questions
TRL provides individual trainer classes (SFTTrainer, DPOTrainer) that you compose in Python code. Axolotl wraps the entire training pipeline in a YAML config with opinionated defaults. Axolotl is faster to get started; TRL gives more programmatic control.
Axolotl supports LLaMA, Mistral, Gemma, Phi, Qwen, and most models available on Hugging Face. Any model compatible with Hugging Face Transformers can be used by specifying the correct model_type in the config.
Yes. Axolotl supports multimodal training for vision-language models. Configure the multimodal dataset format and model type in the YAML config. This feature supports LLaVA-style architectures.
For QLoRA fine-tuning of an 8B model, a single GPU with 24 GB VRAM (RTX 3090, A5000) is sufficient. Full fine-tuning of larger models requires multi-GPU setups. Axolotl integrates with DeepSpeed and FSDP for distributed training.
Yes. Axolotl integrates with Weights and Biases (wandb) and MLflow. Set the wandb_project or mlflow_tracking_uri in your YAML config to log metrics, hyperparameters, and artifacts automatically.
Citations (3)
- Axolotl GitHub— Axolotl streamlines LLM fine-tuning with 11.6K+ GitHub stars
- arXiv— QLoRA: Efficient Finetuning of Quantized Language Models
- arXiv— Flash Attention for efficient transformer training
Related on TokRepo
Source & Thanks
Created by Axolotl AI. Licensed under Apache 2.0. axolotl-ai-cloud/axolotl — 11,600+ GitHub stars
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.