ScriptsMay 2, 2026·3 min read

nanoGPT — The Simplest Repository for Training Medium-Sized GPTs

A minimal, readable codebase for training and fine-tuning GPT-class models from scratch. Written by Andrej Karpathy, nanoGPT strips away framework complexity so you can understand every line of the training loop.

Introduction

nanoGPT is a minimal PyTorch reimplementation of GPT training created by Andrej Karpathy. It is designed to be the simplest, most readable code for training and fine-tuning medium-sized GPT models, making transformer internals accessible to anyone who can read Python.

What nanoGPT Does

  • Trains GPT-2 scale models from scratch on custom datasets
  • Reproduces GPT-2 (124M) on OpenWebText in about 4 days on 8x A100 GPUs
  • Supports character-level and BPE tokenization
  • Provides data preparation scripts for Shakespeare, OpenWebText, and custom corpora
  • Enables fine-tuning pre-trained GPT-2 checkpoints on new data

Architecture Overview

The entire model is defined in a single model.py file implementing a standard decoder-only transformer with causal self-attention, GELU activations, and optional Flash Attention. Training logic lives in train.py using PyTorch DDP for multi-GPU and mixed precision via torch.amp. Configuration is pure Python files that override defaults.

Self-Hosting & Configuration

  • Requires Python 3.10+ and PyTorch 2.0+
  • Data preparation scripts convert raw text into memory-mapped binary token arrays
  • Config files set model size, learning rate, batch size, and device count
  • Supports single GPU, multi-GPU DDP, and Apple MPS backends
  • Weights & Biases integration is optional via the --wandb_log flag

Key Features

  • Entire training codebase fits in roughly 300 lines of Python
  • Reproduces published GPT-2 results at research-grade quality
  • Flash Attention support via PyTorch 2.0 scaled_dot_product_attention
  • Sampling script generates text from trained checkpoints immediately
  • Clean separation of data prep, training, and inference stages

Comparison with Similar Tools

  • Hugging Face Transformers — full-featured library with thousands of models; nanoGPT is purpose-built for learning and small experiments
  • Megatron-LM — NVIDIA's large-scale training framework; far more complex, targets multi-node clusters
  • LitGPT — Lightning-based GPT training; adds configuration abstractions nanoGPT deliberately avoids
  • minGPT — Karpathy's earlier project; nanoGPT is the faster, more optimized successor

FAQ

Q: Can I train a production-grade LLM with nanoGPT? A: It is optimized for learning and reproducing GPT-2. For production-scale training, frameworks like Megatron-LM or LLaMA-Factory are more appropriate.

Q: What hardware do I need? A: A single consumer GPU with 8 GB VRAM can train the character-level Shakespeare model. Reproducing GPT-2 124M requires multiple A100s.

Q: Does it support LoRA or adapter-based fine-tuning? A: Not natively. The codebase does full-parameter fine-tuning. Community forks add PEFT methods.

Q: Is the code actively maintained? A: The repository is intentionally minimal and stable. Updates are infrequent by design.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets