# PEFT — Parameter-Efficient Fine-Tuning for Large Language Models

> PEFT is a Hugging Face library for adapting large pre-trained models using parameter-efficient methods like LoRA, QLoRA, prompt tuning, and prefix tuning. It enables fine-tuning billion-parameter models on consumer hardware by updating only a small fraction of weights.

## Install

Save in your project root:

# PEFT — Parameter-Efficient Fine-Tuning for Large Language Models

## Quick Use
```bash
pip install peft transformers
python -c "
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('gpt2')
config = LoraConfig(r=8, lora_alpha=32, target_modules=['c_attn'], lora_dropout=0.1)
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
"
```

## Introduction
PEFT (Parameter-Efficient Fine-Tuning) is a Hugging Face library that makes it practical to fine-tune large language models without requiring massive GPU clusters. By updating only a tiny subset of model parameters, PEFT methods like LoRA achieve results close to full fine-tuning while using a fraction of the memory and compute.

## What PEFT Does
- Applies LoRA (Low-Rank Adaptation) to inject trainable rank-decomposition matrices into model layers
- Supports QLoRA for fine-tuning quantized 4-bit models on a single GPU
- Implements prompt tuning, prefix tuning, and P-tuning for soft-prompt based adaptation
- Provides adapter methods including IA3 and AdaLoRA for different efficiency-accuracy tradeoffs
- Integrates seamlessly with Hugging Face Transformers, Diffusers, and Accelerate

## Architecture Overview
PEFT wraps a pre-trained model by injecting small trainable modules while freezing the original weights. For LoRA, this means adding low-rank matrices A and B to attention layers such that the effective weight becomes W + BA. During training only A and B are updated, reducing trainable parameters by 99%+. The adapter weights are saved separately and can be merged back into the base model for deployment.

## Self-Hosting & Configuration
- Install via pip: `pip install peft` alongside `transformers` and `accelerate`
- LoRA config requires choosing `r` (rank), `lora_alpha` (scaling), and `target_modules` (which layers to adapt)
- QLoRA setup needs `bitsandbytes` for 4-bit quantization: `pip install bitsandbytes`
- Save adapters with `model.save_pretrained()` — only adapter weights are saved (typically 10-50 MB)
- Load and merge: `PeftModel.from_pretrained(base_model, adapter_path).merge_and_unload()`

## Key Features
- Train 7B+ parameter models on a single consumer GPU with QLoRA
- Multiple adapter methods: LoRA, AdaLoRA, IA3, prompt tuning, prefix tuning
- Adapter composition allows stacking and combining multiple fine-tuned adapters
- Native integration with Hugging Face Hub for sharing and loading community adapters
- Supports multi-adapter inference for switching between tasks without reloading models

## Comparison with Similar Tools
- **Full Fine-Tuning** — updates all parameters for best accuracy but requires 4-10x more GPU memory
- **Unsloth** — optimized LoRA training with 2x speed gains but narrower model support
- **LLaMA-Factory** — GUI-driven fine-tuning with PEFT methods built in but less flexible for custom setups
- **Axolotl** — config-driven fine-tuning wrapper that uses PEFT under the hood
- **OpenDelta** — alternative PEFT library from Tsinghua with similar methods but smaller community

## FAQ
**Q: How much memory does LoRA save compared to full fine-tuning?**
A: LoRA typically reduces trainable parameters by 99%+ and GPU memory by 60-80%. A 7B model that needs 60 GB for full fine-tuning can be LoRA-trained in 16 GB with QLoRA.

**Q: Does LoRA fine-tuning match full fine-tuning quality?**
A: For most downstream tasks, LoRA with rank 16-64 achieves 95-100% of full fine-tuning performance. Tasks requiring broad knowledge changes may benefit from higher rank.

**Q: Can I combine multiple LoRA adapters?**
A: Yes, PEFT supports adapter composition via `add_adapter()` and weighted merging to combine skills from different fine-tunes.

**Q: What models work with PEFT?**
A: Any Hugging Face Transformers or Diffusers model. This includes LLaMA, Mistral, Gemma, Stable Diffusion, and hundreds more.

## Sources
- https://github.com/huggingface/peft
- https://huggingface.co/docs/peft

---
Source: https://tokrepo.com/en/workflows/ee93312c-3d9c-11f1-9bc6-00163e2b0d79
Author: AI Open Source