# Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory

> Fine-tune Llama, Mistral, Gemma, and Qwen models 2x faster using 80% less VRAM. Open-source with no accuracy loss. Train on a single GPU what used to need four.

## Install

Paste the prompt below into your AI tool:

## Quick Use

```bash
pip install unsloth
```

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.3-70B-Instruct-bnb-4bit",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
    lora_dropout=0,
)
```

## What is Unsloth?

Unsloth is an open-source library that makes LLM fine-tuning 2x faster while using 80% less GPU memory — with zero accuracy loss. It achieves this through custom CUDA kernels, intelligent memory management, and optimized backpropagation. Fine-tune a 70B model on a single 48GB GPU.

**Answer-Ready**: Unsloth is an open-source LLM fine-tuning library that achieves 2x speed and 80% VRAM reduction with no accuracy loss. Supports Llama, Mistral, Gemma, Qwen, and Phi models. Fine-tune 70B models on a single GPU. 25k+ GitHub stars.

**Best for**: ML engineers fine-tuning open-source LLMs with limited GPU resources. **Works with**: Llama 3, Mistral, Gemma 2, Qwen 2.5, Phi-3. **Setup time**: Under 5 minutes.

## Core Features

### 1. 2x Faster Training
Custom Triton kernels optimize attention, RoPE, and cross-entropy:

```
Standard fine-tuning: 8 hours on A100
Unsloth fine-tuning:  4 hours on A100 (same accuracy)
```

### 2. 80% Less VRAM

| Model | Standard | Unsloth |
|-------|----------|---------|
| Llama 3 8B | 24GB | 6GB |
| Llama 3 70B | 160GB | 48GB |
| Mistral 7B | 20GB | 5GB |

### 3. Full Training Pipeline

```python
from unsloth import FastLanguageModel
from trl import SFTTrainer
from datasets import load_dataset

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Llama-3.3-8B-Instruct-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16)

# Train
dataset = load_dataset("your-dataset")
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    max_seq_length=2048,
    dataset_text_field="text",
)
trainer.train()

# Save
model.save_pretrained("my-finetuned-model")
```

### 4. Export Formats

```python
# Save as GGUF for Ollama
model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")

# Save merged 16-bit
model.save_pretrained_merged("model-merged", tokenizer)

# Push to Hugging Face
model.push_to_hub("username/my-model")
```

### 5. Supported Models

| Model Family | Sizes |
|-------------|-------|
| Llama 3/3.1/3.3 | 1B, 3B, 8B, 70B |
| Mistral/Mixtral | 7B, 8x7B |
| Gemma 2 | 2B, 9B, 27B |
| Qwen 2.5 | 0.5B-72B |
| Phi-3/3.5 | 3.8B, 14B |

## Google Colab

Free notebook to fine-tune on Colab's T4 GPU:
- Llama 3 8B fine-tuning in 30 minutes
- Uses only 7GB VRAM (fits T4's 15GB)

## FAQ

**Q: Does it affect model quality?**
A: No, Unsloth produces mathematically identical results. The speedup comes from kernel optimization, not approximation.

**Q: Can I use it for full fine-tuning (not LoRA)?**
A: Yes, but LoRA with Unsloth gives the best speed/memory ratio.

**Q: Does it work on consumer GPUs?**
A: Yes, fine-tune 8B models on RTX 3060 (12GB) or even Google Colab T4.

## Source & Thanks

> Created by [Unsloth AI](https://github.com/unslothai). Licensed under Apache 2.0.
>
> [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars

<!-- ZH -->


## Quick Start

```bash
pip install unsloth
```

Three lines of code load a 4-bit quantized model and start fine-tuning.

## What is Unsloth?

Unsloth is an open-source LLM fine-tuning library that delivers 2x speed and 80% memory savings with no loss in accuracy. Fine-tune 70B models on a single GPU.

**In one sentence**: Open-source LLM fine-tuning library — 2x speed, 80% VRAM savings, zero accuracy loss, supports Llama/Mistral/Gemma/Qwen — 25k+ GitHub stars.

**For**: ML engineers with limited GPU resources who need to fine-tune open-source LLMs.

## Core Features

### 1. 2x Training Speed
Custom CUDA kernels optimize attention and backpropagation.

### 2. 80% VRAM Savings
Llama 3 8B needs only 6GB VRAM (vs. 24GB).

### 3. Multi-Format Export
GGUF (Ollama), merged 16-bit, and Hugging Face Hub.

### 4. Broad Model Support
Llama, Mistral, Gemma, Qwen, and Phi families.

## FAQ

**Q: Does it affect model quality?**
A: No — it's mathematically equivalent; speedups come from kernel optimization.

**Q: Can I use consumer GPUs?**
A: Yes — an RTX 3060 (12GB) can fine-tune 8B models.

## Source & Thanks

> [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars, Apache 2.0

---
Source: https://tokrepo.com/en/workflows/unsloth-fine-tune-llms-2x-faster-80-less-memory-149b2641
Author: Prompt Lab