# PEFT — Parameter-Efficient Fine-Tuning for Large Language Models > PEFT is a Hugging Face library for adapting large pre-trained models using parameter-efficient methods like LoRA, QLoRA, prompt tuning, and prefix tuning. It enables fine-tuning billion-parameter models on consumer hardware by updating only a small fraction of weights. ## Install Save in your project root: # PEFT — Parameter-Efficient Fine-Tuning for Large Language Models ## Quick Use ```bash pip install peft transformers python -c " from peft import LoraConfig, get_peft_model from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained('gpt2') config = LoraConfig(r=8, lora_alpha=32, target_modules=['c_attn'], lora_dropout=0.1) peft_model = get_peft_model(model, config) peft_model.print_trainable_parameters() " ``` ## Introduction PEFT (Parameter-Efficient Fine-Tuning) is a Hugging Face library that makes it practical to fine-tune large language models without requiring massive GPU clusters. By updating only a tiny subset of model parameters, PEFT methods like LoRA achieve results close to full fine-tuning while using a fraction of the memory and compute. ## What PEFT Does - Applies LoRA (Low-Rank Adaptation) to inject trainable rank-decomposition matrices into model layers - Supports QLoRA for fine-tuning quantized 4-bit models on a single GPU - Implements prompt tuning, prefix tuning, and P-tuning for soft-prompt based adaptation - Provides adapter methods including IA3 and AdaLoRA for different efficiency-accuracy tradeoffs - Integrates seamlessly with Hugging Face Transformers, Diffusers, and Accelerate ## Architecture Overview PEFT wraps a pre-trained model by injecting small trainable modules while freezing the original weights. For LoRA, this means adding low-rank matrices A and B to attention layers such that the effective weight becomes W + BA. During training only A and B are updated, reducing trainable parameters by 99%+. The adapter weights are saved separately and can be merged back into the base model for deployment. ## Self-Hosting & Configuration - Install via pip: `pip install peft` alongside `transformers` and `accelerate` - LoRA config requires choosing `r` (rank), `lora_alpha` (scaling), and `target_modules` (which layers to adapt) - QLoRA setup needs `bitsandbytes` for 4-bit quantization: `pip install bitsandbytes` - Save adapters with `model.save_pretrained()` — only adapter weights are saved (typically 10-50 MB) - Load and merge: `PeftModel.from_pretrained(base_model, adapter_path).merge_and_unload()` ## Key Features - Train 7B+ parameter models on a single consumer GPU with QLoRA - Multiple adapter methods: LoRA, AdaLoRA, IA3, prompt tuning, prefix tuning - Adapter composition allows stacking and combining multiple fine-tuned adapters - Native integration with Hugging Face Hub for sharing and loading community adapters - Supports multi-adapter inference for switching between tasks without reloading models ## Comparison with Similar Tools - **Full Fine-Tuning** — updates all parameters for best accuracy but requires 4-10x more GPU memory - **Unsloth** — optimized LoRA training with 2x speed gains but narrower model support - **LLaMA-Factory** — GUI-driven fine-tuning with PEFT methods built in but less flexible for custom setups - **Axolotl** — config-driven fine-tuning wrapper that uses PEFT under the hood - **OpenDelta** — alternative PEFT library from Tsinghua with similar methods but smaller community ## FAQ **Q: How much memory does LoRA save compared to full fine-tuning?** A: LoRA typically reduces trainable parameters by 99%+ and GPU memory by 60-80%. A 7B model that needs 60 GB for full fine-tuning can be LoRA-trained in 16 GB with QLoRA. **Q: Does LoRA fine-tuning match full fine-tuning quality?** A: For most downstream tasks, LoRA with rank 16-64 achieves 95-100% of full fine-tuning performance. Tasks requiring broad knowledge changes may benefit from higher rank. **Q: Can I combine multiple LoRA adapters?** A: Yes, PEFT supports adapter composition via `add_adapter()` and weighted merging to combine skills from different fine-tunes. **Q: What models work with PEFT?** A: Any Hugging Face Transformers or Diffusers model. This includes LLaMA, Mistral, Gemma, Stable Diffusion, and hundreds more. ## Sources - https://github.com/huggingface/peft - https://huggingface.co/docs/peft --- Source: https://tokrepo.com/en/workflows/ee93312c-3d9c-11f1-9bc6-00163e2b0d79 Author: AI Open Source