Skills2026年5月13日·1 分钟阅读

minimind — Train a 64M-Parameter LLM from Scratch in 2 Hours

An open-source educational project that lets you train a small but functional language model from scratch on consumer hardware in about two hours, covering the full LLM training pipeline.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
minimind Overview
通用 CLI 安装命令
npx tokrepo install e48c6746-4f09-11f1-9bc6-00163e2b0d79

Introduction

minimind is an open-source educational project that demystifies LLM training by providing a complete pipeline to train a 64M-parameter language model from scratch in approximately two hours on a single consumer GPU. It covers pretraining, supervised fine-tuning, and DPO alignment.

What minimind Does

  • Trains a compact language model from scratch with full pretraining on a text corpus
  • Implements supervised fine-tuning (SFT) for instruction-following capabilities
  • Includes DPO (Direct Preference Optimization) for basic alignment
  • Provides an interactive web demo for chatting with the trained model
  • Documents every training stage with clear explanations in both Chinese and English

Architecture Overview

minimind implements a decoder-only transformer architecture with rotary position embeddings, grouped query attention, and SwiGLU activation. The model uses a custom tokenizer trained on the same corpus. The training pipeline is built with PyTorch and supports distributed training via DDP, though a single GPU is sufficient for the default 64M configuration.

Self-Hosting & Configuration

  • Requires Python 3.9+ with PyTorch and a CUDA GPU (minimum 8GB VRAM)
  • Pretraining data is included or can be replaced with custom text corpora
  • Training configs control model size (26M to 218M parameters), learning rate, and batch size
  • The web demo runs locally with Gradio, accessible through a browser
  • Full training from scratch completes in about 2 hours on an RTX 3090

Key Features

  • End-to-end LLM training in minimal, readable code with extensive documentation
  • Multiple model sizes from 26M to 218M parameters for different hardware budgets
  • Complete pipeline covering tokenizer training, pretraining, SFT, and DPO alignment
  • Bilingual documentation (Chinese and English) making it accessible to a global audience
  • Modular design allows swapping components like attention mechanisms and position encodings

Comparison with Similar Tools

  • nanochat — Karpathy's chat-focused trainer; minimind focuses on the full pretraining pipeline with smaller models
  • nanoGPT — pretraining only; minimind adds SFT and DPO stages for a complete chat model
  • LitGPT — production fine-tuning toolkit; minimind prioritizes educational clarity over feature completeness
  • Axolotl — advanced fine-tuning; minimind teaches fundamentals with a from-scratch approach

FAQ

Q: Can the trained model actually hold conversations? A: Yes. The 64M model handles simple conversations. Larger configs (218M) produce noticeably better results.

Q: What GPU is required? A: An 8GB VRAM GPU (e.g., RTX 3060) works for the smallest model. 16GB+ recommended for larger configs.

Q: Is this useful beyond education? A: The codebase serves as a starting point for custom small model development and domain-specific training experiments.

Q: How does it compare to fine-tuning a pretrained model? A: Training from scratch produces weaker models but provides complete understanding of the LLM pipeline. For production, fine-tuning is more practical.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产