# Liger-Kernel — Efficient GPU Kernels for LLM Training

> Liger-Kernel provides optimized Triton kernels for LLM training that reduce GPU memory usage and improve throughput, serving as drop-in replacements for standard HuggingFace Transformers layers.

## Install

Save in your project root:

# Liger-Kernel — Efficient GPU Kernels for LLM Training

## Quick Use
```bash
pip install liger-kernel
python -c "
from liger_kernel.transformers import apply_liger_kernel_to_llama
apply_liger_kernel_to_llama()
print('Liger kernels applied to LLaMA model class')

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf')
# Model now uses optimized Liger kernels for RMSNorm, SwiGLU, CrossEntropy, etc.
"
```

## Introduction
Liger-Kernel is a collection of Triton GPU kernels purpose-built for large language model training. It optimizes the most memory-intensive and compute-heavy operations in transformer architectures, delivering significant memory savings and throughput improvements with a single function call.

## What Liger-Kernel Does
- Replaces standard RMSNorm with a fused Triton kernel that avoids intermediate allocations
- Implements a fused SwiGLU activation that halves memory usage compared to the naive version
- Provides a chunked cross-entropy loss that processes logits in tiles to avoid materializing the full vocabulary matrix
- Optimizes rotary positional embedding (RoPE) computation with a fused kernel
- Supports FusedLinearCrossEntropy that combines the final linear projection and loss in one pass

## Architecture Overview
Liger-Kernel writes each optimized operation as a Triton kernel that fuses multiple elementwise and reduction steps into a single GPU launch. The apply_liger_kernel_to_* functions monkey-patch the HuggingFace Transformers model classes, replacing standard PyTorch modules with Liger equivalents. No changes to training scripts are required beyond the one-line apply call. Kernels are compiled just-in-time by Triton and cached for subsequent runs.

## Self-Hosting & Configuration
- Install via pip; requires PyTorch 2.x and a Triton-compatible NVIDIA GPU
- Call apply_liger_kernel_to_llama(), apply_liger_kernel_to_mistral(), or the model-specific variant
- Works with HuggingFace Transformers, TRL, and other training frameworks without modification
- Individual kernels can be imported and used standalone for custom model architectures
- Compatible with DeepSpeed ZeRO, FSDP, and other distributed training strategies

## Key Features
- Up to 20% throughput improvement and 60% memory reduction on LLaMA training
- One-line integration with no changes to model code or training loops
- Supports LLaMA, Mistral, Gemma, Qwen, and Phi model families
- Mathematically equivalent outputs with full backward pass support
- Composable kernels that work independently or together

## Comparison with Similar Tools
- **Flash Attention** — optimizes attention computation; Liger-Kernel optimizes non-attention layers like norms, activations, and loss
- **Unsloth** — full training framework with kernel optimizations; Liger-Kernel provides standalone drop-in kernels
- **xformers** — memory-efficient attention and ops by Meta; Liger-Kernel focuses on LLM-specific fused operations
- **DeepSpeed** — distributed training framework; Liger-Kernel complements it with kernel-level optimizations
- **torch.compile** — general JIT compilation; Liger-Kernel provides hand-tuned Triton kernels for specific LLM operations

## FAQ
**Q: Which GPUs are supported?**
A: Any NVIDIA GPU supported by Triton, typically Ampere (A100) and newer. Older GPUs may work but with reduced benefits.

**Q: Does Liger-Kernel change model outputs?**
A: No. The kernels are mathematically equivalent to the standard implementations. Numerical differences are within floating-point tolerance.

**Q: Can I use Liger-Kernel for inference?**
A: The kernels are designed for training workloads. For inference optimization, tools like vLLM or TensorRT-LLM are more appropriate.

**Q: Does it work with LoRA and QLoRA fine-tuning?**
A: Yes. Since Liger-Kernel patches the base model layers, it works transparently with PEFT adapters including LoRA and QLoRA.

## Sources
- https://github.com/linkedin/Liger-Kernel
- https://arxiv.org/abs/2410.10989

---
Source: https://tokrepo.com/en/workflows/asset-fa6e0b07
Author: AI Open Source