What is Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory?

Fine-tune Llama, Mistral, Gemma, and Qwen models 2x faster using 80% less VRAM. Open-source with no accuracy loss. Train on a single GPU what used to need four.

Is Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory free to use?

Yes. Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory

What is Unsloth?

Unsloth is an open-source library that makes LLM fine-tuning 2x faster while using 80% less GPU memory — with zero accuracy loss. It achieves this through custom CUDA kernels, intelligent memory management, and optimized backpropagation. Fine-tune a 70B model on a single 48GB GPU.

Answer-Ready: Unsloth is an open-source LLM fine-tuning library that achieves 2x speed and 80% VRAM reduction with no accuracy loss. Supports Llama, Mistral, Gemma, Qwen, and Phi models. Fine-tune 70B models on a single GPU. 25k+ GitHub stars.

Best for: ML engineers fine-tuning open-source LLMs with limited GPU resources. Works with: Llama 3, Mistral, Gemma 2, Qwen 2.5, Phi-3. Setup time: Under 5 minutes.

Core Features

1. 2x Faster Training

Custom Triton kernels optimize attention, RoPE, and cross-entropy:

Standard fine-tuning: 8 hours on A100
Unsloth fine-tuning:  4 hours on A100 (same accuracy)

2. 80% Less VRAM

Model	Standard	Unsloth
Llama 3 8B	24GB	6GB
Llama 3 70B	160GB	48GB
Mistral 7B	20GB	5GB

3. Full Training Pipeline

from unsloth import FastLanguageModel
from trl import SFTTrainer
from datasets import load_dataset

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Llama-3.3-8B-Instruct-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16)

# Train
dataset = load_dataset("your-dataset")
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    max_seq_length=2048,
    dataset_text_field="text",
)
trainer.train()

# Save
model.save_pretrained("my-finetuned-model")

4. Export Formats

# Save as GGUF for Ollama
model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")

# Save merged 16-bit
model.save_pretrained_merged("model-merged", tokenizer)

# Push to Hugging Face
model.push_to_hub("username/my-model")

5. Supported Models

Model Family	Sizes
Llama 3/3.1/3.3	1B, 3B, 8B, 70B
Mistral/Mixtral	7B, 8x7B
Gemma 2	2B, 9B, 27B
Qwen 2.5	0.5B-72B
Phi-3/3.5	3.8B, 14B

Google Colab

Free notebook to fine-tune on Colab's T4 GPU:

Llama 3 8B fine-tuning in 30 minutes
Uses only 7GB VRAM (fits T4's 15GB)

FAQ

Q: Does it affect model quality? A: No, Unsloth produces mathematically identical results. The speedup comes from kernel optimization, not approximation.

Q: Can I use it for full fine-tuning (not LoRA)? A: Yes, but LoRA with Unsloth gives the best speed/memory ratio.

Q: Does it work on consumer GPUs? A: Yes, fine-tune 8B models on RTX 3060 (12GB) or even Google Colab T4.