Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 10, 2026·3 min de lecture

Liger-Kernel — Efficient GPU Kernels for LLM Training

Liger-Kernel provides optimized Triton kernels for LLM training that reduce GPU memory usage and improve throughput, serving as drop-in replacements for standard HuggingFace Transformers layers.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Liger-Kernel Overview
Commande CLI universelle
npx tokrepo install fa6e0b07-4c49-11f1-9bc6-00163e2b0d79

Introduction

Liger-Kernel is a collection of Triton GPU kernels purpose-built for large language model training. It optimizes the most memory-intensive and compute-heavy operations in transformer architectures, delivering significant memory savings and throughput improvements with a single function call.

What Liger-Kernel Does

  • Replaces standard RMSNorm with a fused Triton kernel that avoids intermediate allocations
  • Implements a fused SwiGLU activation that halves memory usage compared to the naive version
  • Provides a chunked cross-entropy loss that processes logits in tiles to avoid materializing the full vocabulary matrix
  • Optimizes rotary positional embedding (RoPE) computation with a fused kernel
  • Supports FusedLinearCrossEntropy that combines the final linear projection and loss in one pass

Architecture Overview

Liger-Kernel writes each optimized operation as a Triton kernel that fuses multiple elementwise and reduction steps into a single GPU launch. The apply_liger_kernel_to_* functions monkey-patch the HuggingFace Transformers model classes, replacing standard PyTorch modules with Liger equivalents. No changes to training scripts are required beyond the one-line apply call. Kernels are compiled just-in-time by Triton and cached for subsequent runs.

Self-Hosting & Configuration

  • Install via pip; requires PyTorch 2.x and a Triton-compatible NVIDIA GPU
  • Call apply_liger_kernel_to_llama(), apply_liger_kernel_to_mistral(), or the model-specific variant
  • Works with HuggingFace Transformers, TRL, and other training frameworks without modification
  • Individual kernels can be imported and used standalone for custom model architectures
  • Compatible with DeepSpeed ZeRO, FSDP, and other distributed training strategies

Key Features

  • Up to 20% throughput improvement and 60% memory reduction on LLaMA training
  • One-line integration with no changes to model code or training loops
  • Supports LLaMA, Mistral, Gemma, Qwen, and Phi model families
  • Mathematically equivalent outputs with full backward pass support
  • Composable kernels that work independently or together

Comparison with Similar Tools

  • Flash Attention — optimizes attention computation; Liger-Kernel optimizes non-attention layers like norms, activations, and loss
  • Unsloth — full training framework with kernel optimizations; Liger-Kernel provides standalone drop-in kernels
  • xformers — memory-efficient attention and ops by Meta; Liger-Kernel focuses on LLM-specific fused operations
  • DeepSpeed — distributed training framework; Liger-Kernel complements it with kernel-level optimizations
  • torch.compile — general JIT compilation; Liger-Kernel provides hand-tuned Triton kernels for specific LLM operations

FAQ

Q: Which GPUs are supported? A: Any NVIDIA GPU supported by Triton, typically Ampere (A100) and newer. Older GPUs may work but with reduced benefits.

Q: Does Liger-Kernel change model outputs? A: No. The kernels are mathematically equivalent to the standard implementations. Numerical differences are within floating-point tolerance.

Q: Can I use Liger-Kernel for inference? A: The kernels are designed for training workloads. For inference optimization, tools like vLLM or TensorRT-LLM are more appropriate.

Q: Does it work with LoRA and QLoRA fine-tuning? A: Yes. Since Liger-Kernel patches the base model layers, it works transparently with PEFT adapters including LoRA and QLoRA.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires