# Liger-Kernel — Efficient GPU Kernels for LLM Training > Liger-Kernel provides optimized Triton kernels for LLM training that reduce GPU memory usage and improve throughput, serving as drop-in replacements for standard HuggingFace Transformers layers. ## Install Save in your project root: # Liger-Kernel — Efficient GPU Kernels for LLM Training ## Quick Use ```bash pip install liger-kernel python -c " from liger_kernel.transformers import apply_liger_kernel_to_llama apply_liger_kernel_to_llama() print('Liger kernels applied to LLaMA model class') from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf') # Model now uses optimized Liger kernels for RMSNorm, SwiGLU, CrossEntropy, etc. " ``` ## Introduction Liger-Kernel is a collection of Triton GPU kernels purpose-built for large language model training. It optimizes the most memory-intensive and compute-heavy operations in transformer architectures, delivering significant memory savings and throughput improvements with a single function call. ## What Liger-Kernel Does - Replaces standard RMSNorm with a fused Triton kernel that avoids intermediate allocations - Implements a fused SwiGLU activation that halves memory usage compared to the naive version - Provides a chunked cross-entropy loss that processes logits in tiles to avoid materializing the full vocabulary matrix - Optimizes rotary positional embedding (RoPE) computation with a fused kernel - Supports FusedLinearCrossEntropy that combines the final linear projection and loss in one pass ## Architecture Overview Liger-Kernel writes each optimized operation as a Triton kernel that fuses multiple elementwise and reduction steps into a single GPU launch. The apply_liger_kernel_to_* functions monkey-patch the HuggingFace Transformers model classes, replacing standard PyTorch modules with Liger equivalents. No changes to training scripts are required beyond the one-line apply call. Kernels are compiled just-in-time by Triton and cached for subsequent runs. ## Self-Hosting & Configuration - Install via pip; requires PyTorch 2.x and a Triton-compatible NVIDIA GPU - Call apply_liger_kernel_to_llama(), apply_liger_kernel_to_mistral(), or the model-specific variant - Works with HuggingFace Transformers, TRL, and other training frameworks without modification - Individual kernels can be imported and used standalone for custom model architectures - Compatible with DeepSpeed ZeRO, FSDP, and other distributed training strategies ## Key Features - Up to 20% throughput improvement and 60% memory reduction on LLaMA training - One-line integration with no changes to model code or training loops - Supports LLaMA, Mistral, Gemma, Qwen, and Phi model families - Mathematically equivalent outputs with full backward pass support - Composable kernels that work independently or together ## Comparison with Similar Tools - **Flash Attention** — optimizes attention computation; Liger-Kernel optimizes non-attention layers like norms, activations, and loss - **Unsloth** — full training framework with kernel optimizations; Liger-Kernel provides standalone drop-in kernels - **xformers** — memory-efficient attention and ops by Meta; Liger-Kernel focuses on LLM-specific fused operations - **DeepSpeed** — distributed training framework; Liger-Kernel complements it with kernel-level optimizations - **torch.compile** — general JIT compilation; Liger-Kernel provides hand-tuned Triton kernels for specific LLM operations ## FAQ **Q: Which GPUs are supported?** A: Any NVIDIA GPU supported by Triton, typically Ampere (A100) and newer. Older GPUs may work but with reduced benefits. **Q: Does Liger-Kernel change model outputs?** A: No. The kernels are mathematically equivalent to the standard implementations. Numerical differences are within floating-point tolerance. **Q: Can I use Liger-Kernel for inference?** A: The kernels are designed for training workloads. For inference optimization, tools like vLLM or TensorRT-LLM are more appropriate. **Q: Does it work with LoRA and QLoRA fine-tuning?** A: Yes. Since Liger-Kernel patches the base model layers, it works transparently with PEFT adapters including LoRA and QLoRA. ## Sources - https://github.com/linkedin/Liger-Kernel - https://arxiv.org/abs/2410.10989 --- Source: https://tokrepo.com/en/workflows/asset-fa6e0b07 Author: AI Open Source