# Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory

> Fine-tune Llama, Mistral, Gemma, and Qwen models 2x faster using 80% less VRAM. Open-source with no accuracy loss. Train on a single GPU what used to need four.

## Install

Paste the prompt below into your AI tool:

## Quick Use

```bash
pip install unsloth
```

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.3-70B-Instruct-bnb-4bit",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
    lora_dropout=0,
)
```

## What is Unsloth?

Unsloth is an open-source library that makes LLM fine-tuning 2x faster while using 80% less GPU memory — with zero accuracy loss. It achieves this through custom CUDA kernels, intelligent memory management, and optimized backpropagation. Fine-tune a 70B model on a single 48GB GPU.

**Answer-Ready**: Unsloth is an open-source LLM fine-tuning library that achieves 2x speed and 80% VRAM reduction with no accuracy loss. Supports Llama, Mistral, Gemma, Qwen, and Phi models. Fine-tune 70B models on a single GPU. 25k+ GitHub stars.

**Best for**: ML engineers fine-tuning open-source LLMs with limited GPU resources. **Works with**: Llama 3, Mistral, Gemma 2, Qwen 2.5, Phi-3. **Setup time**: Under 5 minutes.

## Core Features

### 1. 2x Faster Training
Custom Triton kernels optimize attention, RoPE, and cross-entropy:

```
Standard fine-tuning: 8 hours on A100
Unsloth fine-tuning:  4 hours on A100 (same accuracy)
```

### 2. 80% Less VRAM

| Model | Standard | Unsloth |
|-------|----------|---------|
| Llama 3 8B | 24GB | 6GB |
| Llama 3 70B | 160GB | 48GB |
| Mistral 7B | 20GB | 5GB |

### 3. Full Training Pipeline

```python
from unsloth import FastLanguageModel
from trl import SFTTrainer
from datasets import load_dataset

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Llama-3.3-8B-Instruct-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16)

# Train
dataset = load_dataset("your-dataset")
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    max_seq_length=2048,
    dataset_text_field="text",
)
trainer.train()

# Save
model.save_pretrained("my-finetuned-model")
```

### 4. Export Formats

```python
# Save as GGUF for Ollama
model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m")

# Save merged 16-bit
model.save_pretrained_merged("model-merged", tokenizer)

# Push to Hugging Face
model.push_to_hub("username/my-model")
```

### 5. Supported Models

| Model Family | Sizes |
|-------------|-------|
| Llama 3/3.1/3.3 | 1B, 3B, 8B, 70B |
| Mistral/Mixtral | 7B, 8x7B |
| Gemma 2 | 2B, 9B, 27B |
| Qwen 2.5 | 0.5B-72B |
| Phi-3/3.5 | 3.8B, 14B |

## Google Colab

Free notebook to fine-tune on Colab's T4 GPU:
- Llama 3 8B fine-tuning in 30 minutes
- Uses only 7GB VRAM (fits T4's 15GB)

## FAQ

**Q: Does it affect model quality?**
A: No, Unsloth produces mathematically identical results. The speedup comes from kernel optimization, not approximation.

**Q: Can I use it for full fine-tuning (not LoRA)?**
A: Yes, but LoRA with Unsloth gives the best speed/memory ratio.

**Q: Does it work on consumer GPUs?**
A: Yes, fine-tune 8B models on RTX 3060 (12GB) or even Google Colab T4.

## Source & Thanks

> Created by [Unsloth AI](https://github.com/unslothai). Licensed under Apache 2.0.
>
> [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars

<!-- ZH -->

## 快速使用

```bash
pip install unsloth
```

三行代码加载 4bit 量化模型，开始微调。

## 什么是 Unsloth？

Unsloth 是开源 LLM 微调库，实现 2 倍速度和 80% 显存节省，无精度损失。在单张 GPU 上微调 70B 模型。

**一句话总结**：开源 LLM 微调库，2 倍速度、80% 显存节省、零精度损失，支持 Llama/Mistral/Gemma/Qwen，25k+ GitHub stars。

**适合人群**：GPU 资源有限需要微调开源 LLM 的 ML 工程师。

## 核心功能

### 1. 2 倍训练速度
自定义 CUDA 内核优化注意力和反向传播。

### 2. 80% 显存节省
Llama 3 8B 仅需 6GB 显存（原需 24GB）。

### 3. 多格式导出
GGUF（Ollama）、合并 16bit、Hugging Face Hub。

### 4. 广泛模型支持
Llama、Mistral、Gemma、Qwen、Phi 系列。

## 常见问题

**Q: 影响模型质量吗？**
A: 不影响，数学上等价，加速来自内核优化。

**Q: 消费级 GPU 能用吗？**
A: 能，RTX 3060（12GB）即可微调 8B 模型。

## 来源与致谢

> [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars, Apache 2.0

---
Source: https://tokrepo.com/en/workflows/149b2641-d550-4faf-96de-3c6aee66ec58
Author: Prompt Lab