# minimind — Train a 64M-Parameter LLM from Scratch in 2 Hours > An open-source educational project that lets you train a small but functional language model from scratch on consumer hardware in about two hours, covering the full LLM training pipeline. ## Install Save in your project root: # minimind — Train a 64M-Parameter LLM from Scratch in 2 Hours ## Quick Use ```bash git clone https://github.com/jingyaogong/minimind.git cd minimind pip install -r requirements.txt python train_pretrain.py python train_sft.py python web_demo.py ``` ## Introduction minimind is an open-source educational project that demystifies LLM training by providing a complete pipeline to train a 64M-parameter language model from scratch in approximately two hours on a single consumer GPU. It covers pretraining, supervised fine-tuning, and DPO alignment. ## What minimind Does - Trains a compact language model from scratch with full pretraining on a text corpus - Implements supervised fine-tuning (SFT) for instruction-following capabilities - Includes DPO (Direct Preference Optimization) for basic alignment - Provides an interactive web demo for chatting with the trained model - Documents every training stage with clear explanations in both Chinese and English ## Architecture Overview minimind implements a decoder-only transformer architecture with rotary position embeddings, grouped query attention, and SwiGLU activation. The model uses a custom tokenizer trained on the same corpus. The training pipeline is built with PyTorch and supports distributed training via DDP, though a single GPU is sufficient for the default 64M configuration. ## Self-Hosting & Configuration - Requires Python 3.9+ with PyTorch and a CUDA GPU (minimum 8GB VRAM) - Pretraining data is included or can be replaced with custom text corpora - Training configs control model size (26M to 218M parameters), learning rate, and batch size - The web demo runs locally with Gradio, accessible through a browser - Full training from scratch completes in about 2 hours on an RTX 3090 ## Key Features - End-to-end LLM training in minimal, readable code with extensive documentation - Multiple model sizes from 26M to 218M parameters for different hardware budgets - Complete pipeline covering tokenizer training, pretraining, SFT, and DPO alignment - Bilingual documentation (Chinese and English) making it accessible to a global audience - Modular design allows swapping components like attention mechanisms and position encodings ## Comparison with Similar Tools - **nanochat** — Karpathy's chat-focused trainer; minimind focuses on the full pretraining pipeline with smaller models - **nanoGPT** — pretraining only; minimind adds SFT and DPO stages for a complete chat model - **LitGPT** — production fine-tuning toolkit; minimind prioritizes educational clarity over feature completeness - **Axolotl** — advanced fine-tuning; minimind teaches fundamentals with a from-scratch approach ## FAQ **Q: Can the trained model actually hold conversations?** A: Yes. The 64M model handles simple conversations. Larger configs (218M) produce noticeably better results. **Q: What GPU is required?** A: An 8GB VRAM GPU (e.g., RTX 3060) works for the smallest model. 16GB+ recommended for larger configs. **Q: Is this useful beyond education?** A: The codebase serves as a starting point for custom small model development and domain-specific training experiments. **Q: How does it compare to fine-tuning a pretrained model?** A: Training from scratch produces weaker models but provides complete understanding of the LLM pipeline. For production, fine-tuning is more practical. ## Sources - https://github.com/jingyaogong/minimind --- Source: https://tokrepo.com/en/workflows/asset-e48c6746 Author: AI Open Source