# Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory > Fine-tune Llama, Mistral, Gemma, and Qwen models 2x faster using 80% less VRAM. Open-source with no accuracy loss. Train on a single GPU what used to need four. ## Install Paste the prompt below into your AI tool: ## Quick Use ```bash pip install unsloth ``` ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/Llama-3.3-70B-Instruct-bnb-4bit", max_seq_length=4096, load_in_4bit=True, ) model = FastLanguageModel.get_peft_model( model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], lora_alpha=16, lora_dropout=0, ) ``` ## What is Unsloth? Unsloth is an open-source library that makes LLM fine-tuning 2x faster while using 80% less GPU memory — with zero accuracy loss. It achieves this through custom CUDA kernels, intelligent memory management, and optimized backpropagation. Fine-tune a 70B model on a single 48GB GPU. **Answer-Ready**: Unsloth is an open-source LLM fine-tuning library that achieves 2x speed and 80% VRAM reduction with no accuracy loss. Supports Llama, Mistral, Gemma, Qwen, and Phi models. Fine-tune 70B models on a single GPU. 25k+ GitHub stars. **Best for**: ML engineers fine-tuning open-source LLMs with limited GPU resources. **Works with**: Llama 3, Mistral, Gemma 2, Qwen 2.5, Phi-3. **Setup time**: Under 5 minutes. ## Core Features ### 1. 2x Faster Training Custom Triton kernels optimize attention, RoPE, and cross-entropy: ``` Standard fine-tuning: 8 hours on A100 Unsloth fine-tuning: 4 hours on A100 (same accuracy) ``` ### 2. 80% Less VRAM | Model | Standard | Unsloth | |-------|----------|---------| | Llama 3 8B | 24GB | 6GB | | Llama 3 70B | 160GB | 48GB | | Mistral 7B | 20GB | 5GB | ### 3. Full Training Pipeline ```python from unsloth import FastLanguageModel from trl import SFTTrainer from datasets import load_dataset # Load model model, tokenizer = FastLanguageModel.from_pretrained( "unsloth/Llama-3.3-8B-Instruct-bnb-4bit", max_seq_length=2048, load_in_4bit=True, ) # Add LoRA adapters model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16) # Train dataset = load_dataset("your-dataset") trainer = SFTTrainer( model=model, train_dataset=dataset, max_seq_length=2048, dataset_text_field="text", ) trainer.train() # Save model.save_pretrained("my-finetuned-model") ``` ### 4. Export Formats ```python # Save as GGUF for Ollama model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m") # Save merged 16-bit model.save_pretrained_merged("model-merged", tokenizer) # Push to Hugging Face model.push_to_hub("username/my-model") ``` ### 5. Supported Models | Model Family | Sizes | |-------------|-------| | Llama 3/3.1/3.3 | 1B, 3B, 8B, 70B | | Mistral/Mixtral | 7B, 8x7B | | Gemma 2 | 2B, 9B, 27B | | Qwen 2.5 | 0.5B-72B | | Phi-3/3.5 | 3.8B, 14B | ## Google Colab Free notebook to fine-tune on Colab's T4 GPU: - Llama 3 8B fine-tuning in 30 minutes - Uses only 7GB VRAM (fits T4's 15GB) ## FAQ **Q: Does it affect model quality?** A: No, Unsloth produces mathematically identical results. The speedup comes from kernel optimization, not approximation. **Q: Can I use it for full fine-tuning (not LoRA)?** A: Yes, but LoRA with Unsloth gives the best speed/memory ratio. **Q: Does it work on consumer GPUs?** A: Yes, fine-tune 8B models on RTX 3060 (12GB) or even Google Colab T4. ## Source & Thanks > Created by [Unsloth AI](https://github.com/unslothai). Licensed under Apache 2.0. > > [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars ## 快速使用 ```bash pip install unsloth ``` 三行代码加载 4bit 量化模型,开始微调。 ## 什么是 Unsloth? Unsloth 是开源 LLM 微调库,实现 2 倍速度和 80% 显存节省,无精度损失。在单张 GPU 上微调 70B 模型。 **一句话总结**:开源 LLM 微调库,2 倍速度、80% 显存节省、零精度损失,支持 Llama/Mistral/Gemma/Qwen,25k+ GitHub stars。 **适合人群**:GPU 资源有限需要微调开源 LLM 的 ML 工程师。 ## 核心功能 ### 1. 2 倍训练速度 自定义 CUDA 内核优化注意力和反向传播。 ### 2. 80% 显存节省 Llama 3 8B 仅需 6GB 显存(原需 24GB)。 ### 3. 多格式导出 GGUF(Ollama)、合并 16bit、Hugging Face Hub。 ### 4. 广泛模型支持 Llama、Mistral、Gemma、Qwen、Phi 系列。 ## 常见问题 **Q: 影响模型质量吗?** A: 不影响,数学上等价,加速来自内核优化。 **Q: 消费级 GPU 能用吗?** A: 能,RTX 3060(12GB)即可微调 8B 模型。 ## 来源与致谢 > [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars, Apache 2.0 --- Source: https://tokrepo.com/en/workflows/149b2641-d550-4faf-96de-3c6aee66ec58 Author: Prompt Lab