# nanoGPT — The Simplest Repository for Training Medium-Sized GPTs > A minimal, readable codebase for training and fine-tuning GPT-class models from scratch. Written by Andrej Karpathy, nanoGPT strips away framework complexity so you can understand every line of the training loop. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: # nanoGPT — The Simplest Repository for Training Medium-Sized GPTs ## Quick Use ```bash pip install torch numpy transformers datasets tiktoken wandb tqdm git clone https://github.com/karpathy/nanoGPT.git && cd nanoGPT python data/shakespeare_char/prepare.py python train.py config/train_shakespeare_char.py ``` ## Introduction nanoGPT is a minimal PyTorch reimplementation of GPT training created by Andrej Karpathy. It is designed to be the simplest, most readable code for training and fine-tuning medium-sized GPT models, making transformer internals accessible to anyone who can read Python. ## What nanoGPT Does - Trains GPT-2 scale models from scratch on custom datasets - Reproduces GPT-2 (124M) on OpenWebText in about 4 days on 8x A100 GPUs - Supports character-level and BPE tokenization - Provides data preparation scripts for Shakespeare, OpenWebText, and custom corpora - Enables fine-tuning pre-trained GPT-2 checkpoints on new data ## Architecture Overview The entire model is defined in a single `model.py` file implementing a standard decoder-only transformer with causal self-attention, GELU activations, and optional Flash Attention. Training logic lives in `train.py` using PyTorch DDP for multi-GPU and mixed precision via torch.amp. Configuration is pure Python files that override defaults. ## Self-Hosting & Configuration - Requires Python 3.10+ and PyTorch 2.0+ - Data preparation scripts convert raw text into memory-mapped binary token arrays - Config files set model size, learning rate, batch size, and device count - Supports single GPU, multi-GPU DDP, and Apple MPS backends - Weights & Biases integration is optional via the `--wandb_log` flag ## Key Features - Entire training codebase fits in roughly 300 lines of Python - Reproduces published GPT-2 results at research-grade quality - Flash Attention support via PyTorch 2.0 scaled_dot_product_attention - Sampling script generates text from trained checkpoints immediately - Clean separation of data prep, training, and inference stages ## Comparison with Similar Tools - **Hugging Face Transformers** — full-featured library with thousands of models; nanoGPT is purpose-built for learning and small experiments - **Megatron-LM** — NVIDIA's large-scale training framework; far more complex, targets multi-node clusters - **LitGPT** — Lightning-based GPT training; adds configuration abstractions nanoGPT deliberately avoids - **minGPT** — Karpathy's earlier project; nanoGPT is the faster, more optimized successor ## FAQ **Q: Can I train a production-grade LLM with nanoGPT?** A: It is optimized for learning and reproducing GPT-2. For production-scale training, frameworks like Megatron-LM or LLaMA-Factory are more appropriate. **Q: What hardware do I need?** A: A single consumer GPU with 8 GB VRAM can train the character-level Shakespeare model. Reproducing GPT-2 124M requires multiple A100s. **Q: Does it support LoRA or adapter-based fine-tuning?** A: Not natively. The codebase does full-parameter fine-tuning. Community forks add PEFT methods. **Q: Is the code actively maintained?** A: The repository is intentionally minimal and stable. Updates are infrequent by design. ## Sources - https://github.com/karpathy/nanoGPT - https://github.com/karpathy/nanoGPT/blob/master/README.md --- Source: https://tokrepo.com/en/workflows/nanogpt-simplest-repository-training-medium-sized-gpts-63ba44df Author: Script Depot