Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 2, 2026·3 min de lecture

nanoGPT — The Simplest Repository for Training Medium-Sized GPTs

A minimal, readable codebase for training and fine-tuning GPT-class models from scratch. Written by Andrej Karpathy, nanoGPT strips away framework complexity so you can understand every line of the training loop.

Introduction

nanoGPT is a minimal PyTorch reimplementation of GPT training created by Andrej Karpathy. It is designed to be the simplest, most readable code for training and fine-tuning medium-sized GPT models, making transformer internals accessible to anyone who can read Python.

What nanoGPT Does

  • Trains GPT-2 scale models from scratch on custom datasets
  • Reproduces GPT-2 (124M) on OpenWebText in about 4 days on 8x A100 GPUs
  • Supports character-level and BPE tokenization
  • Provides data preparation scripts for Shakespeare, OpenWebText, and custom corpora
  • Enables fine-tuning pre-trained GPT-2 checkpoints on new data

Architecture Overview

The entire model is defined in a single model.py file implementing a standard decoder-only transformer with causal self-attention, GELU activations, and optional Flash Attention. Training logic lives in train.py using PyTorch DDP for multi-GPU and mixed precision via torch.amp. Configuration is pure Python files that override defaults.

Self-Hosting & Configuration

  • Requires Python 3.10+ and PyTorch 2.0+
  • Data preparation scripts convert raw text into memory-mapped binary token arrays
  • Config files set model size, learning rate, batch size, and device count
  • Supports single GPU, multi-GPU DDP, and Apple MPS backends
  • Weights & Biases integration is optional via the --wandb_log flag

Key Features

  • Entire training codebase fits in roughly 300 lines of Python
  • Reproduces published GPT-2 results at research-grade quality
  • Flash Attention support via PyTorch 2.0 scaled_dot_product_attention
  • Sampling script generates text from trained checkpoints immediately
  • Clean separation of data prep, training, and inference stages

Comparison with Similar Tools

  • Hugging Face Transformers — full-featured library with thousands of models; nanoGPT is purpose-built for learning and small experiments
  • Megatron-LM — NVIDIA's large-scale training framework; far more complex, targets multi-node clusters
  • LitGPT — Lightning-based GPT training; adds configuration abstractions nanoGPT deliberately avoids
  • minGPT — Karpathy's earlier project; nanoGPT is the faster, more optimized successor

FAQ

Q: Can I train a production-grade LLM with nanoGPT? A: It is optimized for learning and reproducing GPT-2. For production-scale training, frameworks like Megatron-LM or LLaMA-Factory are more appropriate.

Q: What hardware do I need? A: A single consumer GPU with 8 GB VRAM can train the character-level Shakespeare model. Reproducing GPT-2 124M requires multiple A100s.

Q: Does it support LoRA or adapter-based fine-tuning? A: Not natively. The codebase does full-parameter fine-tuning. Community forks add PEFT methods.

Q: Is the code actively maintained? A: The repository is intentionally minimal and stable. Updates are infrequent by design.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires