What is Unsloth?
Unsloth is an open-source library that makes LLM fine-tuning 2x faster while using 80% less GPU memory — with zero accuracy loss. It achieves this through custom CUDA kernels, intelligent memory management, and optimized backpropagation. Fine-tune a 70B model on a single 48GB GPU.
Answer-Ready: Unsloth is an open-source LLM fine-tuning library that achieves 2x speed and 80% VRAM reduction with no accuracy loss. Supports Llama, Mistral, Gemma, Qwen, and Phi models. Fine-tune 70B models on a single GPU. 25k+ GitHub stars.
Best for: ML engineers fine-tuning open-source LLMs with limited GPU resources. Works with: Llama 3, Mistral, Gemma 2, Qwen 2.5, Phi-3. Setup time: Under 5 minutes.
Core Features
1. 2x Faster Training
Custom Triton kernels optimize attention, RoPE, and cross-entropy:
Standard fine-tuning: 8 hours on A100
Unsloth fine-tuning: 4 hours on A100 (same accuracy)2. 80% Less VRAM
| Model | Standard | Unsloth |
|---|---|---|
| Llama 3 8B | 24GB | 6GB |
| Llama 3 70B | 160GB | 48GB |
| Mistral 7B | 20GB | 5GB |
3. Full Training Pipeline
from unsloth import FastLanguageModel
from trl import SFTTrainer
from datasets import load_dataset
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Llama-3.3-8B-Instruct-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16)
# Train
dataset = load_dataset("your-dataset")
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
max_seq_length=2048,
dataset_text_field="text",
)
trainer.train()
# Save
model.save_pretrained("my-finetuned-model")4. Export Formats
# Save as GGUF for Ollama
model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")
# Save merged 16-bit
model.save_pretrained_merged("model-merged", tokenizer)
# Push to Hugging Face
model.push_to_hub("username/my-model")5. Supported Models
| Model Family | Sizes |
|---|---|
| Llama 3/3.1/3.3 | 1B, 3B, 8B, 70B |
| Mistral/Mixtral | 7B, 8x7B |
| Gemma 2 | 2B, 9B, 27B |
| Qwen 2.5 | 0.5B-72B |
| Phi-3/3.5 | 3.8B, 14B |
Google Colab
Free notebook to fine-tune on Colab's T4 GPU:
- Llama 3 8B fine-tuning in 30 minutes
- Uses only 7GB VRAM (fits T4's 15GB)
FAQ
Q: Does it affect model quality? A: No, Unsloth produces mathematically identical results. The speedup comes from kernel optimization, not approximation.
Q: Can I use it for full fine-tuning (not LoRA)? A: Yes, but LoRA with Unsloth gives the best speed/memory ratio.
Q: Does it work on consumer GPUs? A: Yes, fine-tune 8B models on RTX 3060 (12GB) or even Google Colab T4.