SkillsMay 13, 2026·3 min read

minimind — Train a 64M-Parameter LLM from Scratch in 2 Hours

An open-source educational project that lets you train a small but functional language model from scratch on consumer hardware in about two hours, covering the full LLM training pipeline.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
minimind Overview
Universal CLI install command
npx tokrepo install e48c6746-4f09-11f1-9bc6-00163e2b0d79

Introduction

minimind is an open-source educational project that demystifies LLM training by providing a complete pipeline to train a 64M-parameter language model from scratch in approximately two hours on a single consumer GPU. It covers pretraining, supervised fine-tuning, and DPO alignment.

What minimind Does

  • Trains a compact language model from scratch with full pretraining on a text corpus
  • Implements supervised fine-tuning (SFT) for instruction-following capabilities
  • Includes DPO (Direct Preference Optimization) for basic alignment
  • Provides an interactive web demo for chatting with the trained model
  • Documents every training stage with clear explanations in both Chinese and English

Architecture Overview

minimind implements a decoder-only transformer architecture with rotary position embeddings, grouped query attention, and SwiGLU activation. The model uses a custom tokenizer trained on the same corpus. The training pipeline is built with PyTorch and supports distributed training via DDP, though a single GPU is sufficient for the default 64M configuration.

Self-Hosting & Configuration

  • Requires Python 3.9+ with PyTorch and a CUDA GPU (minimum 8GB VRAM)
  • Pretraining data is included or can be replaced with custom text corpora
  • Training configs control model size (26M to 218M parameters), learning rate, and batch size
  • The web demo runs locally with Gradio, accessible through a browser
  • Full training from scratch completes in about 2 hours on an RTX 3090

Key Features

  • End-to-end LLM training in minimal, readable code with extensive documentation
  • Multiple model sizes from 26M to 218M parameters for different hardware budgets
  • Complete pipeline covering tokenizer training, pretraining, SFT, and DPO alignment
  • Bilingual documentation (Chinese and English) making it accessible to a global audience
  • Modular design allows swapping components like attention mechanisms and position encodings

Comparison with Similar Tools

  • nanochat — Karpathy's chat-focused trainer; minimind focuses on the full pretraining pipeline with smaller models
  • nanoGPT — pretraining only; minimind adds SFT and DPO stages for a complete chat model
  • LitGPT — production fine-tuning toolkit; minimind prioritizes educational clarity over feature completeness
  • Axolotl — advanced fine-tuning; minimind teaches fundamentals with a from-scratch approach

FAQ

Q: Can the trained model actually hold conversations? A: Yes. The 64M model handles simple conversations. Larger configs (218M) produce noticeably better results.

Q: What GPU is required? A: An 8GB VRAM GPU (e.g., RTX 3060) works for the smallest model. 16GB+ recommended for larger configs.

Q: Is this useful beyond education? A: The codebase serves as a starting point for custom small model development and domain-specific training experiments.

Q: How does it compare to fine-tuning a pretrained model? A: Training from scratch produces weaker models but provides complete understanding of the LLM pipeline. For production, fine-tuning is more practical.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets