# Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory > Fine-tune Llama, Mistral, Gemma, and Qwen models 2x faster using 80% less VRAM. Open-source with no accuracy loss. Train on a single GPU what used to need four. ## Install Paste the prompt below into your AI tool: ## Quick Use ```bash pip install unsloth ``` ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/Llama-3.3-70B-Instruct-bnb-4bit", max_seq_length=4096, load_in_4bit=True, ) model = FastLanguageModel.get_peft_model( model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], lora_alpha=16, lora_dropout=0, ) ``` ## What is Unsloth? Unsloth is an open-source library that makes LLM fine-tuning 2x faster while using 80% less GPU memory — with zero accuracy loss. It achieves this through custom CUDA kernels, intelligent memory management, and optimized backpropagation. Fine-tune a 70B model on a single 48GB GPU. **Answer-Ready**: Unsloth is an open-source LLM fine-tuning library that achieves 2x speed and 80% VRAM reduction with no accuracy loss. Supports Llama, Mistral, Gemma, Qwen, and Phi models. Fine-tune 70B models on a single GPU. 25k+ GitHub stars. **Best for**: ML engineers fine-tuning open-source LLMs with limited GPU resources. **Works with**: Llama 3, Mistral, Gemma 2, Qwen 2.5, Phi-3. **Setup time**: Under 5 minutes. ## Core Features ### 1. 2x Faster Training Custom Triton kernels optimize attention, RoPE, and cross-entropy: ``` Standard fine-tuning: 8 hours on A100 Unsloth fine-tuning: 4 hours on A100 (same accuracy) ``` ### 2. 80% Less VRAM | Model | Standard | Unsloth | |-------|----------|---------| | Llama 3 8B | 24GB | 6GB | | Llama 3 70B | 160GB | 48GB | | Mistral 7B | 20GB | 5GB | ### 3. Full Training Pipeline ```python from unsloth import FastLanguageModel from trl import SFTTrainer from datasets import load_dataset # Load model model, tokenizer = FastLanguageModel.from_pretrained( "unsloth/Llama-3.3-8B-Instruct-bnb-4bit", max_seq_length=2048, load_in_4bit=True, ) # Add LoRA adapters model = FastLanguageModel.get_peft_model(model, r=16, lora_alpha=16) # Train dataset = load_dataset("your-dataset") trainer = SFTTrainer( model=model, train_dataset=dataset, max_seq_length=2048, dataset_text_field="text", ) trainer.train() # Save model.save_pretrained("my-finetuned-model") ``` ### 4. Export Formats ```python # Save as GGUF for Ollama model.save_pretrained_gguf("model", tokenizer, quantization_method="q4_k_m") # Save merged 16-bit model.save_pretrained_merged("model-merged", tokenizer) # Push to Hugging Face model.push_to_hub("username/my-model") ``` ### 5. Supported Models | Model Family | Sizes | |-------------|-------| | Llama 3/3.1/3.3 | 1B, 3B, 8B, 70B | | Mistral/Mixtral | 7B, 8x7B | | Gemma 2 | 2B, 9B, 27B | | Qwen 2.5 | 0.5B-72B | | Phi-3/3.5 | 3.8B, 14B | ## Google Colab Free notebook to fine-tune on Colab's T4 GPU: - Llama 3 8B fine-tuning in 30 minutes - Uses only 7GB VRAM (fits T4's 15GB) ## FAQ **Q: Does it affect model quality?** A: No, Unsloth produces mathematically identical results. The speedup comes from kernel optimization, not approximation. **Q: Can I use it for full fine-tuning (not LoRA)?** A: Yes, but LoRA with Unsloth gives the best speed/memory ratio. **Q: Does it work on consumer GPUs?** A: Yes, fine-tune 8B models on RTX 3060 (12GB) or even Google Colab T4. ## Source & Thanks > Created by [Unsloth AI](https://github.com/unslothai). Licensed under Apache 2.0. > > [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars ## Quick Start ```bash pip install unsloth ``` Three lines of code load a 4-bit quantized model and start fine-tuning. ## What is Unsloth? Unsloth is an open-source LLM fine-tuning library that delivers 2x speed and 80% memory savings with no loss in accuracy. Fine-tune 70B models on a single GPU. **In one sentence**: Open-source LLM fine-tuning library — 2x speed, 80% VRAM savings, zero accuracy loss, supports Llama/Mistral/Gemma/Qwen — 25k+ GitHub stars. **For**: ML engineers with limited GPU resources who need to fine-tune open-source LLMs. ## Core Features ### 1. 2x Training Speed Custom CUDA kernels optimize attention and backpropagation. ### 2. 80% VRAM Savings Llama 3 8B needs only 6GB VRAM (vs. 24GB). ### 3. Multi-Format Export GGUF (Ollama), merged 16-bit, and Hugging Face Hub. ### 4. Broad Model Support Llama, Mistral, Gemma, Qwen, and Phi families. ## FAQ **Q: Does it affect model quality?** A: No — it's mathematically equivalent; speedups come from kernel optimization. **Q: Can I use consumer GPUs?** A: Yes — an RTX 3060 (12GB) can fine-tune 8B models. ## Source & Thanks > [unslothai/unsloth](https://github.com/unslothai/unsloth) — 25k+ stars, Apache 2.0 --- Source: https://tokrepo.com/en/workflows/unsloth-fine-tune-llms-2x-faster-80-less-memory-149b2641 Author: Prompt Lab