Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 15, 2026·3 min de lectura

Mergekit — Toolkit for Merging Pretrained LLMs

Mergekit is an open source library for merging pretrained large language models. It supports multiple merge methods including SLERP, TIES, DARE, and linear interpolation, and can run entirely on CPU with minimal memory.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Mergekit
Comando CLI universal
npx tokrepo install 0a9ca395-5017-11f1-9bc6-00163e2b0d79

Introduction

Mergekit is a toolkit by Arcee AI for combining multiple pretrained language models into a single model without additional training. Model merging has become a popular technique in the open source LLM community for creating models that combine the strengths of different fine-tunes, and Mergekit provides the most comprehensive set of merging methods in a single tool.

What Mergekit Does

  • Merges two or more pretrained language models into a single checkpoint
  • Supports multiple merge strategies including linear, SLERP, TIES, DARE, and passthrough
  • Uses an out-of-core approach to merge models larger than available RAM or VRAM
  • Outputs merged models in Hugging Face Safetensors format ready for inference or further fine-tuning
  • Provides evolutionary merge search (mergekit-evolve) for automated recipe optimization

Architecture Overview

Mergekit processes model weights layer by layer using an out-of-core streaming approach, loading only the tensors needed for the current merge operation. This design allows merging 70B+ parameter models on machines with as little as 8 GB of RAM. Merge operations are defined in YAML configuration files that specify source models, method, and per-tensor or per-layer parameter overrides. GPU acceleration is optional and speeds up tensor interpolation.

Self-Hosting & Configuration

  • Merge recipes are defined in YAML files specifying models, method, and parameters
  • Models can be referenced by local path or Hugging Face model ID
  • The --cuda flag enables GPU acceleration for faster tensor operations
  • Output directory contains a complete Hugging Face-compatible model ready for upload
  • Advanced configs support per-layer weight overrides and custom tensor mappings

Key Features

  • Out-of-core merging enables processing models far larger than available memory
  • SLERP (Spherical Linear Interpolation) preserves the geometry of weight manifolds
  • TIES and DARE methods resolve parameter conflicts between divergent fine-tunes
  • Evolutionary merge search automatically optimizes merge recipes against evaluation benchmarks
  • Passthrough method enables frankenmerging by stacking layers from different models

Comparison with Similar Tools

  • PEFT/LoRA merging — Merges adapter weights only; Mergekit merges full model weights for a standalone checkpoint
  • Model soups — Simple weight averaging; Mergekit offers more sophisticated interpolation methods
  • LM-Cocktail — Merge method from research; Mergekit implements this alongside many other methods
  • LazyMergekit — Colab wrapper around Mergekit; simplifies the UI but uses the same underlying library
  • Fine-tuning — Trains on new data; merging combines existing capabilities without additional compute

FAQ

Q: Does merging require a GPU? A: No. Mergekit can run entirely on CPU thanks to its out-of-core design. A GPU speeds up the process but is optional.

Q: Will the merged model be as good as fine-tuning? A: Merging combines existing capabilities and can produce strong results, especially when source models are complementary. For learning entirely new tasks, fine-tuning is more appropriate.

Q: What model architectures are supported? A: Mergekit supports most Hugging Face-compatible architectures including Llama, Mistral, Qwen, Phi, and their derivatives.

Q: How do I choose a merge method? A: SLERP is a good default for two models. TIES or DARE work better for merging three or more models by resolving parameter conflicts. Linear is the simplest but can dilute features.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados