# minimind — Train a 64M-Parameter LLM from Scratch in 2 Hours

> An open-source educational project that lets you train a small but functional language model from scratch on consumer hardware in about two hours, covering the full LLM training pipeline.

## Install

Save in your project root:

# minimind — Train a 64M-Parameter LLM from Scratch in 2 Hours

## Quick Use
```bash
git clone https://github.com/jingyaogong/minimind.git
cd minimind
pip install -r requirements.txt
python train_pretrain.py
python train_sft.py
python web_demo.py
```

## Introduction
minimind is an open-source educational project that demystifies LLM training by providing a complete pipeline to train a 64M-parameter language model from scratch in approximately two hours on a single consumer GPU. It covers pretraining, supervised fine-tuning, and DPO alignment.

## What minimind Does
- Trains a compact language model from scratch with full pretraining on a text corpus
- Implements supervised fine-tuning (SFT) for instruction-following capabilities
- Includes DPO (Direct Preference Optimization) for basic alignment
- Provides an interactive web demo for chatting with the trained model
- Documents every training stage with clear explanations in both Chinese and English

## Architecture Overview
minimind implements a decoder-only transformer architecture with rotary position embeddings, grouped query attention, and SwiGLU activation. The model uses a custom tokenizer trained on the same corpus. The training pipeline is built with PyTorch and supports distributed training via DDP, though a single GPU is sufficient for the default 64M configuration.

## Self-Hosting & Configuration
- Requires Python 3.9+ with PyTorch and a CUDA GPU (minimum 8GB VRAM)
- Pretraining data is included or can be replaced with custom text corpora
- Training configs control model size (26M to 218M parameters), learning rate, and batch size
- The web demo runs locally with Gradio, accessible through a browser
- Full training from scratch completes in about 2 hours on an RTX 3090

## Key Features
- End-to-end LLM training in minimal, readable code with extensive documentation
- Multiple model sizes from 26M to 218M parameters for different hardware budgets
- Complete pipeline covering tokenizer training, pretraining, SFT, and DPO alignment
- Bilingual documentation (Chinese and English) making it accessible to a global audience
- Modular design allows swapping components like attention mechanisms and position encodings

## Comparison with Similar Tools
- **nanochat** — Karpathy's chat-focused trainer; minimind focuses on the full pretraining pipeline with smaller models
- **nanoGPT** — pretraining only; minimind adds SFT and DPO stages for a complete chat model
- **LitGPT** — production fine-tuning toolkit; minimind prioritizes educational clarity over feature completeness
- **Axolotl** — advanced fine-tuning; minimind teaches fundamentals with a from-scratch approach

## FAQ
**Q: Can the trained model actually hold conversations?**
A: Yes. The 64M model handles simple conversations. Larger configs (218M) produce noticeably better results.

**Q: What GPU is required?**
A: An 8GB VRAM GPU (e.g., RTX 3060) works for the smallest model. 16GB+ recommended for larger configs.

**Q: Is this useful beyond education?**
A: The codebase serves as a starting point for custom small model development and domain-specific training experiments.

**Q: How does it compare to fine-tuning a pretrained model?**
A: Training from scratch produces weaker models but provides complete understanding of the LLM pipeline. For production, fine-tuning is more practical.

## Sources
- https://github.com/jingyaogong/minimind

---
Source: https://tokrepo.com/en/workflows/asset-e48c6746
Author: AI Open Source