Skills2026年5月2日·1 分钟阅读

nanoGPT — The Simplest Repository for Training Medium-Sized GPTs

A minimal, readable codebase for training and fine-tuning GPT-class models from scratch. Written by Andrej Karpathy, nanoGPT strips away framework complexity so you can understand every line of the training loop.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
nanoGPT Overview
通用 CLI 安装命令
npx tokrepo install 63ba44df-45df-11f1-9bc6-00163e2b0d79

Introduction

nanoGPT is a minimal PyTorch reimplementation of GPT training created by Andrej Karpathy. It is designed to be the simplest, most readable code for training and fine-tuning medium-sized GPT models, making transformer internals accessible to anyone who can read Python.

What nanoGPT Does

  • Trains GPT-2 scale models from scratch on custom datasets
  • Reproduces GPT-2 (124M) on OpenWebText in about 4 days on 8x A100 GPUs
  • Supports character-level and BPE tokenization
  • Provides data preparation scripts for Shakespeare, OpenWebText, and custom corpora
  • Enables fine-tuning pre-trained GPT-2 checkpoints on new data

Architecture Overview

The entire model is defined in a single model.py file implementing a standard decoder-only transformer with causal self-attention, GELU activations, and optional Flash Attention. Training logic lives in train.py using PyTorch DDP for multi-GPU and mixed precision via torch.amp. Configuration is pure Python files that override defaults.

Self-Hosting & Configuration

  • Requires Python 3.10+ and PyTorch 2.0+
  • Data preparation scripts convert raw text into memory-mapped binary token arrays
  • Config files set model size, learning rate, batch size, and device count
  • Supports single GPU, multi-GPU DDP, and Apple MPS backends
  • Weights & Biases integration is optional via the --wandb_log flag

Key Features

  • Entire training codebase fits in roughly 300 lines of Python
  • Reproduces published GPT-2 results at research-grade quality
  • Flash Attention support via PyTorch 2.0 scaled_dot_product_attention
  • Sampling script generates text from trained checkpoints immediately
  • Clean separation of data prep, training, and inference stages

Comparison with Similar Tools

  • Hugging Face Transformers — full-featured library with thousands of models; nanoGPT is purpose-built for learning and small experiments
  • Megatron-LM — NVIDIA's large-scale training framework; far more complex, targets multi-node clusters
  • LitGPT — Lightning-based GPT training; adds configuration abstractions nanoGPT deliberately avoids
  • minGPT — Karpathy's earlier project; nanoGPT is the faster, more optimized successor

FAQ

Q: Can I train a production-grade LLM with nanoGPT? A: It is optimized for learning and reproducing GPT-2. For production-scale training, frameworks like Megatron-LM or LLaMA-Factory are more appropriate.

Q: What hardware do I need? A: A single consumer GPU with 8 GB VRAM can train the character-level Shakespeare model. Reproducing GPT-2 124M requires multiple A100s.

Q: Does it support LoRA or adapter-based fine-tuning? A: Not natively. The codebase does full-parameter fine-tuning. Community forks add PEFT methods.

Q: Is the code actively maintained? A: The repository is intentionally minimal and stable. Updates are infrequent by design.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产