Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory
Fine-tune Llama, Mistral, Gemma, and Qwen models 2x faster using 80% less VRAM. Open-source with no accuracy loss. Train on a single GPU what used to need four.
What it is
Unsloth is an open-source library that accelerates LLM fine-tuning by 2x while reducing VRAM usage by up to 80%. It supports popular model families including Llama, Mistral, Gemma, and Qwen.
The tool targets ML engineers and hobbyists who want to fine-tune large language models on consumer-grade hardware. What previously required four GPUs can now run on a single card without sacrificing model accuracy.
The project is actively maintained and suitable for both individual developers and teams looking to integrate it into their existing toolchain. Documentation and community support are available for onboarding.
How it saves time or tokens
Unsloth's custom CUDA kernels and memory-efficient training loops cut fine-tuning time in half. The 80% VRAM reduction means you can train a 7B model on a single 24GB GPU instead of needing multi-GPU setups. This translates directly to lower cloud compute costs and faster iteration cycles. The estimated token budget for this workflow is around 4,100 tokens.
For teams evaluating multiple tools in the same category, the clear documentation and active community reduce the time spent on research and troubleshooting. Getting started takes minutes rather than hours of configuration.
How to use
- Install Unsloth via pip with your preferred PyTorch and CUDA versions.
- Load a base model (Llama 3, Mistral, Gemma, or Qwen) using the Unsloth fast loading path.
- Apply LoRA adapters with your chosen rank and target modules.
- Train using the standard Hugging Face Trainer or SFTTrainer with your dataset.
- Save or merge the adapter weights back into the base model for deployment.
Example
from unsloth import FastLanguageModel
# Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
model_name='unsloth/llama-3-8b-bnb-4bit',
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
lora_alpha=16,
lora_dropout=0,
)
Related on TokRepo
- Local LLM Tools — Compare inference engines that run fine-tuned models locally after training.
- AI Tools for Coding — Explore how fine-tuned models integrate into coding workflows.
Common pitfalls
- Using a CUDA version incompatible with Unsloth's kernels causes silent fallback to slower paths. Check the compatibility matrix before installing.
- Setting LoRA rank too high negates memory savings. Start with r=16 and increase only if quality metrics demand it.
- Forgetting to merge adapters before deployment means you ship two files instead of one, complicating inference setups.
- Applying the skill without reading the documentation first. Each skill has specific prerequisites and configuration requirements that affect the quality of results.
Frequently Asked Questions
Unsloth supports Llama 2 and 3, Mistral, Gemma, Qwen, and other popular model families. The library provides pre-configured fast-loading paths for common model variants including 4-bit quantized versions.
No. Unsloth relies on custom CUDA kernels for its speed and memory optimizations. A CUDA-compatible NVIDIA GPU with at least 8GB VRAM is required. AMD GPUs are not supported.
Unsloth uses gradient checkpointing, custom triton kernels, and memory-efficient attention implementations to reduce peak VRAM usage. Combined with 4-bit quantization (QLoRA), this allows 7B models to fine-tune on a single 24GB card.
Yes. Unsloth models are compatible with the standard Hugging Face SFTTrainer and Trainer classes. You prepare your model with Unsloth's fast loading, then pass it to the trainer as usual.
No. Unsloth's optimizations are mathematically equivalent to standard training. The speedup comes from kernel-level optimizations, not from approximations that would reduce model quality.
Citations (3)
- Unsloth GitHub— 2x faster fine-tuning with 80% less VRAM
- Unsloth Documentation— QLoRA and 4-bit quantization support
- LoRA Paper (Hu et al.)— LoRA adapter fine-tuning methodology
Related on TokRepo
Source & Thanks
Created by Unsloth AI. Licensed under Apache 2.0.
unslothai/unsloth — 25k+ stars