Esta página se muestra en inglés. Una traducción al español está en curso.

SkillsMar 31, 2026·2 min de lectura

Unsloth — 2x Faster Local LLM Training & Inference

Unsloth is a unified local interface for running and training AI models. 58.7K+ GitHub stars. 2x faster training with 70% less VRAM across 500+ models including Qwen, DeepSeek, Llama, Gemma. Web UI wi

AI Open Source · Community

Listo para agents

Instalación con revisión previa

Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.

Needs Confirmation · 64/100Política: confirmar

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Single

Confianza

Confianza: Established

Entrada

Unsloth — 2x Faster Local LLM Training & Inference

Comando con revisión previa

npx -y tokrepo@latest install a69b498a-76d7-4cb4-b4fd-d4006a89b5a0 --target codex

Primero dry-run, confirma las escrituras y luego ejecuta este comando.

TL;DR

Unsloth accelerates LLM fine-tuning by 2x while cutting VRAM usage by 70%, supporting 500+ models with a simple Web UI or CLI.

§01

What it is

Unsloth is a unified local interface for running and training AI models. It provides up to 2x faster training with 70% less VRAM usage across 500+ models including Qwen, DeepSeek, Llama, and Gemma. It includes a web UI with one-click fine-tuning, a CLI for automated workflows, and full compatibility with the Hugging Face ecosystem.

It targets ML engineers and developers who want to fine-tune LLMs on consumer GPUs without expensive cloud compute, and researchers who need faster iteration cycles.

§02

How it saves time or tokens

Unsloth's memory optimizations let you fine-tune models that would otherwise require multiple expensive GPUs on a single consumer GPU. A model that needs 48GB VRAM with standard training may need only 14GB with Unsloth. This means you can fine-tune on an RTX 4090 instead of renting an A100, saving significant compute costs.

§03

How to use

Install:

curl -fsSL https://unsloth.ai/install.sh | sh

Or via pip:

pip install unsloth

Fine-tune a model:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name='unsloth/Llama-3.2-3B-Instruct',
    max_seq_length=2048,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model, r=16, lora_alpha=16,
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
)

# Train with your dataset using standard HuggingFace Trainer

Or use the web UI for no-code fine-tuning.

§04

Example

Metric	Standard training	Unsloth
Training speed	1x	2x
VRAM usage	100%	30%
RTX 4090 max model	7B	20B+
Cost (cloud A100)	$3/hr	$1.50/hr (half the time)

§05

Related on TokRepo

Local LLM tools -- tools for running LLMs locally
AI tools for coding -- developer tools for AI

§06

Common pitfalls

Unsloth optimizations are specific to certain model architectures. Check compatibility before starting a training run with a new model.
4-bit training (QLoRA) reduces VRAM usage further but may slightly affect model quality compared to full-precision LoRA.
The web UI is convenient for getting started but the Python API provides more control for advanced training configurations.

Preguntas frecuentes

Which GPUs does Unsloth support?+

Unsloth supports NVIDIA GPUs with CUDA (RTX 3060 and newer are recommended). Apple Silicon support is available through the MLX backend. AMD GPUs have experimental support via ROCm. The VRAM savings are most impactful on consumer GPUs like the RTX 4090 where memory is limited.

Does Unsloth support LoRA and QLoRA?+

Yes. Unsloth fully supports LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) training methods. QLoRA combines 4-bit quantization with LoRA to minimize VRAM usage. Both methods produce models compatible with the standard Hugging Face ecosystem.

Can I export Unsloth-trained models to GGUF?+

Yes. Unsloth can export trained models to GGUF format for use with llama.cpp, Ollama, and other inference engines. This lets you train with Unsloth and deploy with your preferred serving solution. The export handles quantization and format conversion automatically.

Is Unsloth free?+

Unsloth has an open-source version that is free for personal and commercial use. A Pro version offers additional features like longer context support, more model architectures, and priority support. The free version covers most common fine-tuning use cases.

How does Unsloth achieve 2x speedup?+

Unsloth uses custom CUDA kernels optimized for transformer attention patterns, intelligent memory management that reduces fragmentation, and efficient gradient checkpointing. These optimizations are applied automatically when you load a model through Unsloth's API. No manual tuning is needed.

Referencias (3)

Unsloth GitHub— Unsloth training optimization framework
Unsloth Docs— Unsloth documentation and installation
QLoRA Paper (arXiv)— QLoRA quantized fine-tuning method

Relacionados en TokRepo

Local LLM tools AI coding tools Featured workflows

🙏

Fuente y agradecimientos

Created by Unsloth AI. Licensed under Apache 2.0 / AGPL-3.0. unslothai/unsloth — 58,700+ GitHub stars

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Unsloth — Fine-Tune LLMs 2x Faster with 80% Less Memory

Fine-tune Llama, Mistral, Gemma, and Qwen models 2x faster using 80% less VRAM. Open-source with no accuracy loss. Train on a single GPU what used to need four.

Prompts

Prompt Lab

Olive — Optimize Models for Faster Inference

Olive automates model optimization via a CLI so teams can reduce latency and cost (e.g., quantization/ONNX paths) before serving models in apps or agents.

SkillsCLI Tools

AI Open Source

WhisperX — 70x Faster Speech Recognition

WhisperX provides 70x realtime speech recognition with word-level timestamps and speaker diarization. 21K+ GitHub stars. Batched inference, under 8GB VRAM. BSD-2-Clause.

Skills

Script Depot

Faster Whisper — 4x Faster Speech-to-Text

Faster Whisper is a reimplementation of OpenAI Whisper using CTranslate2, up to 4x faster with less memory. 21.8K+ GitHub stars. GPU/CPU, 8-bit quantization, word timestamps, VAD. MIT licensed.

Skills

Script Depot