Introduction
Mergekit is a toolkit by Arcee AI for combining multiple pretrained language models into a single model without additional training. Model merging has become a popular technique in the open source LLM community for creating models that combine the strengths of different fine-tunes, and Mergekit provides the most comprehensive set of merging methods in a single tool.
What Mergekit Does
- Merges two or more pretrained language models into a single checkpoint
- Supports multiple merge strategies including linear, SLERP, TIES, DARE, and passthrough
- Uses an out-of-core approach to merge models larger than available RAM or VRAM
- Outputs merged models in Hugging Face Safetensors format ready for inference or further fine-tuning
- Provides evolutionary merge search (
mergekit-evolve) for automated recipe optimization
Architecture Overview
Mergekit processes model weights layer by layer using an out-of-core streaming approach, loading only the tensors needed for the current merge operation. This design allows merging 70B+ parameter models on machines with as little as 8 GB of RAM. Merge operations are defined in YAML configuration files that specify source models, method, and per-tensor or per-layer parameter overrides. GPU acceleration is optional and speeds up tensor interpolation.
Self-Hosting & Configuration
- Merge recipes are defined in YAML files specifying models, method, and parameters
- Models can be referenced by local path or Hugging Face model ID
- The
--cudaflag enables GPU acceleration for faster tensor operations - Output directory contains a complete Hugging Face-compatible model ready for upload
- Advanced configs support per-layer weight overrides and custom tensor mappings
Key Features
- Out-of-core merging enables processing models far larger than available memory
- SLERP (Spherical Linear Interpolation) preserves the geometry of weight manifolds
- TIES and DARE methods resolve parameter conflicts between divergent fine-tunes
- Evolutionary merge search automatically optimizes merge recipes against evaluation benchmarks
- Passthrough method enables frankenmerging by stacking layers from different models
Comparison with Similar Tools
- PEFT/LoRA merging — Merges adapter weights only; Mergekit merges full model weights for a standalone checkpoint
- Model soups — Simple weight averaging; Mergekit offers more sophisticated interpolation methods
- LM-Cocktail — Merge method from research; Mergekit implements this alongside many other methods
- LazyMergekit — Colab wrapper around Mergekit; simplifies the UI but uses the same underlying library
- Fine-tuning — Trains on new data; merging combines existing capabilities without additional compute
FAQ
Q: Does merging require a GPU? A: No. Mergekit can run entirely on CPU thanks to its out-of-core design. A GPU speeds up the process but is optional.
Q: Will the merged model be as good as fine-tuning? A: Merging combines existing capabilities and can produce strong results, especially when source models are complementary. For learning entirely new tasks, fine-tuning is more appropriate.
Q: What model architectures are supported? A: Mergekit supports most Hugging Face-compatible architectures including Llama, Mistral, Qwen, Phi, and their derivatives.
Q: How do I choose a merge method? A: SLERP is a good default for two models. TIES or DARE work better for merging three or more models by resolving parameter conflicts. Linear is the simplest but can dilute features.