Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 15, 2026·3 min de lecture

Mergekit — Toolkit for Merging Pretrained LLMs

Mergekit is an open source library for merging pretrained large language models. It supports multiple merge methods including SLERP, TIES, DARE, and linear interpolation, and can run entirely on CPU with minimal memory.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Mergekit
Commande CLI universelle
npx tokrepo install 0a9ca395-5017-11f1-9bc6-00163e2b0d79

Introduction

Mergekit is a toolkit by Arcee AI for combining multiple pretrained language models into a single model without additional training. Model merging has become a popular technique in the open source LLM community for creating models that combine the strengths of different fine-tunes, and Mergekit provides the most comprehensive set of merging methods in a single tool.

What Mergekit Does

  • Merges two or more pretrained language models into a single checkpoint
  • Supports multiple merge strategies including linear, SLERP, TIES, DARE, and passthrough
  • Uses an out-of-core approach to merge models larger than available RAM or VRAM
  • Outputs merged models in Hugging Face Safetensors format ready for inference or further fine-tuning
  • Provides evolutionary merge search (mergekit-evolve) for automated recipe optimization

Architecture Overview

Mergekit processes model weights layer by layer using an out-of-core streaming approach, loading only the tensors needed for the current merge operation. This design allows merging 70B+ parameter models on machines with as little as 8 GB of RAM. Merge operations are defined in YAML configuration files that specify source models, method, and per-tensor or per-layer parameter overrides. GPU acceleration is optional and speeds up tensor interpolation.

Self-Hosting & Configuration

  • Merge recipes are defined in YAML files specifying models, method, and parameters
  • Models can be referenced by local path or Hugging Face model ID
  • The --cuda flag enables GPU acceleration for faster tensor operations
  • Output directory contains a complete Hugging Face-compatible model ready for upload
  • Advanced configs support per-layer weight overrides and custom tensor mappings

Key Features

  • Out-of-core merging enables processing models far larger than available memory
  • SLERP (Spherical Linear Interpolation) preserves the geometry of weight manifolds
  • TIES and DARE methods resolve parameter conflicts between divergent fine-tunes
  • Evolutionary merge search automatically optimizes merge recipes against evaluation benchmarks
  • Passthrough method enables frankenmerging by stacking layers from different models

Comparison with Similar Tools

  • PEFT/LoRA merging — Merges adapter weights only; Mergekit merges full model weights for a standalone checkpoint
  • Model soups — Simple weight averaging; Mergekit offers more sophisticated interpolation methods
  • LM-Cocktail — Merge method from research; Mergekit implements this alongside many other methods
  • LazyMergekit — Colab wrapper around Mergekit; simplifies the UI but uses the same underlying library
  • Fine-tuning — Trains on new data; merging combines existing capabilities without additional compute

FAQ

Q: Does merging require a GPU? A: No. Mergekit can run entirely on CPU thanks to its out-of-core design. A GPU speeds up the process but is optional.

Q: Will the merged model be as good as fine-tuning? A: Merging combines existing capabilities and can produce strong results, especially when source models are complementary. For learning entirely new tasks, fine-tuning is more appropriate.

Q: What model architectures are supported? A: Mergekit supports most Hugging Face-compatible architectures including Llama, Mistral, Qwen, Phi, and their derivatives.

Q: How do I choose a merge method? A: SLERP is a good default for two models. TIES or DARE work better for merging three or more models by resolving parameter conflicts. Linear is the simplest but can dilute features.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires