Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 13, 2026·3 min de lecture

minimind — Train a 64M-Parameter LLM from Scratch in 2 Hours

An open-source educational project that lets you train a small but functional language model from scratch on consumer hardware in about two hours, covering the full LLM training pipeline.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
minimind Overview
Commande CLI universelle
npx tokrepo install e48c6746-4f09-11f1-9bc6-00163e2b0d79

Introduction

minimind is an open-source educational project that demystifies LLM training by providing a complete pipeline to train a 64M-parameter language model from scratch in approximately two hours on a single consumer GPU. It covers pretraining, supervised fine-tuning, and DPO alignment.

What minimind Does

  • Trains a compact language model from scratch with full pretraining on a text corpus
  • Implements supervised fine-tuning (SFT) for instruction-following capabilities
  • Includes DPO (Direct Preference Optimization) for basic alignment
  • Provides an interactive web demo for chatting with the trained model
  • Documents every training stage with clear explanations in both Chinese and English

Architecture Overview

minimind implements a decoder-only transformer architecture with rotary position embeddings, grouped query attention, and SwiGLU activation. The model uses a custom tokenizer trained on the same corpus. The training pipeline is built with PyTorch and supports distributed training via DDP, though a single GPU is sufficient for the default 64M configuration.

Self-Hosting & Configuration

  • Requires Python 3.9+ with PyTorch and a CUDA GPU (minimum 8GB VRAM)
  • Pretraining data is included or can be replaced with custom text corpora
  • Training configs control model size (26M to 218M parameters), learning rate, and batch size
  • The web demo runs locally with Gradio, accessible through a browser
  • Full training from scratch completes in about 2 hours on an RTX 3090

Key Features

  • End-to-end LLM training in minimal, readable code with extensive documentation
  • Multiple model sizes from 26M to 218M parameters for different hardware budgets
  • Complete pipeline covering tokenizer training, pretraining, SFT, and DPO alignment
  • Bilingual documentation (Chinese and English) making it accessible to a global audience
  • Modular design allows swapping components like attention mechanisms and position encodings

Comparison with Similar Tools

  • nanochat — Karpathy's chat-focused trainer; minimind focuses on the full pretraining pipeline with smaller models
  • nanoGPT — pretraining only; minimind adds SFT and DPO stages for a complete chat model
  • LitGPT — production fine-tuning toolkit; minimind prioritizes educational clarity over feature completeness
  • Axolotl — advanced fine-tuning; minimind teaches fundamentals with a from-scratch approach

FAQ

Q: Can the trained model actually hold conversations? A: Yes. The 64M model handles simple conversations. Larger configs (218M) produce noticeably better results.

Q: What GPU is required? A: An 8GB VRAM GPU (e.g., RTX 3060) works for the smallest model. 16GB+ recommended for larger configs.

Q: Is this useful beyond education? A: The codebase serves as a starting point for custom small model development and domain-specific training experiments.

Q: How does it compare to fine-tuning a pretrained model? A: Training from scratch produces weaker models but provides complete understanding of the LLM pipeline. For production, fine-tuning is more practical.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires