# Mergekit — Toolkit for Merging Pretrained LLMs

> Mergekit is an open source library for merging pretrained large language models. It supports multiple merge methods including SLERP, TIES, DARE, and linear interpolation, and can run entirely on CPU with minimal memory.

## Install

Save as a script file and run:

# Mergekit — Toolkit for Merging Pretrained LLMs

## Quick Use
```bash
# Install from PyPI
pip install mergekit

# Create a merge config (YAML)
cat > config.yml << EOF
models:
  - model: model_a
    parameters:
      weight: 0.6
  - model: model_b
    parameters:
      weight: 0.4
merge_method: linear
dtype: float16
EOF

# Run the merge
mergekit-yaml config.yml ./merged-model --cuda
```

## Introduction
Mergekit is a toolkit by Arcee AI for combining multiple pretrained language models into a single model without additional training. Model merging has become a popular technique in the open source LLM community for creating models that combine the strengths of different fine-tunes, and Mergekit provides the most comprehensive set of merging methods in a single tool.

## What Mergekit Does
- Merges two or more pretrained language models into a single checkpoint
- Supports multiple merge strategies including linear, SLERP, TIES, DARE, and passthrough
- Uses an out-of-core approach to merge models larger than available RAM or VRAM
- Outputs merged models in Hugging Face Safetensors format ready for inference or further fine-tuning
- Provides evolutionary merge search (`mergekit-evolve`) for automated recipe optimization

## Architecture Overview
Mergekit processes model weights layer by layer using an out-of-core streaming approach, loading only the tensors needed for the current merge operation. This design allows merging 70B+ parameter models on machines with as little as 8 GB of RAM. Merge operations are defined in YAML configuration files that specify source models, method, and per-tensor or per-layer parameter overrides. GPU acceleration is optional and speeds up tensor interpolation.

## Self-Hosting & Configuration
- Merge recipes are defined in YAML files specifying models, method, and parameters
- Models can be referenced by local path or Hugging Face model ID
- The `--cuda` flag enables GPU acceleration for faster tensor operations
- Output directory contains a complete Hugging Face-compatible model ready for upload
- Advanced configs support per-layer weight overrides and custom tensor mappings

## Key Features
- Out-of-core merging enables processing models far larger than available memory
- SLERP (Spherical Linear Interpolation) preserves the geometry of weight manifolds
- TIES and DARE methods resolve parameter conflicts between divergent fine-tunes
- Evolutionary merge search automatically optimizes merge recipes against evaluation benchmarks
- Passthrough method enables frankenmerging by stacking layers from different models

## Comparison with Similar Tools
- **PEFT/LoRA merging** — Merges adapter weights only; Mergekit merges full model weights for a standalone checkpoint
- **Model soups** — Simple weight averaging; Mergekit offers more sophisticated interpolation methods
- **LM-Cocktail** — Merge method from research; Mergekit implements this alongside many other methods
- **LazyMergekit** — Colab wrapper around Mergekit; simplifies the UI but uses the same underlying library
- **Fine-tuning** — Trains on new data; merging combines existing capabilities without additional compute

## FAQ
**Q: Does merging require a GPU?**
A: No. Mergekit can run entirely on CPU thanks to its out-of-core design. A GPU speeds up the process but is optional.

**Q: Will the merged model be as good as fine-tuning?**
A: Merging combines existing capabilities and can produce strong results, especially when source models are complementary. For learning entirely new tasks, fine-tuning is more appropriate.

**Q: What model architectures are supported?**
A: Mergekit supports most Hugging Face-compatible architectures including Llama, Mistral, Qwen, Phi, and their derivatives.

**Q: How do I choose a merge method?**
A: SLERP is a good default for two models. TIES or DARE work better for merging three or more models by resolving parameter conflicts. Linear is the simplest but can dilute features.

## Sources
- https://github.com/arcee-ai/mergekit
- https://arxiv.org/abs/2403.13257

---
Source: https://tokrepo.com/en/workflows/asset-0a9ca395
Author: Script Depot