# nanoGPT — The Simplest Repository for Training Medium-Sized GPTs

> A minimal, readable codebase for training and fine-tuning GPT-class models from scratch. Written by Andrej Karpathy, nanoGPT strips away framework complexity so you can understand every line of the training loop.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

# nanoGPT — The Simplest Repository for Training Medium-Sized GPTs

## Quick Use
```bash
pip install torch numpy transformers datasets tiktoken wandb tqdm
git clone https://github.com/karpathy/nanoGPT.git && cd nanoGPT
python data/shakespeare_char/prepare.py
python train.py config/train_shakespeare_char.py
```

## Introduction
nanoGPT is a minimal PyTorch reimplementation of GPT training created by Andrej Karpathy. It is designed to be the simplest, most readable code for training and fine-tuning medium-sized GPT models, making transformer internals accessible to anyone who can read Python.

## What nanoGPT Does
- Trains GPT-2 scale models from scratch on custom datasets
- Reproduces GPT-2 (124M) on OpenWebText in about 4 days on 8x A100 GPUs
- Supports character-level and BPE tokenization
- Provides data preparation scripts for Shakespeare, OpenWebText, and custom corpora
- Enables fine-tuning pre-trained GPT-2 checkpoints on new data

## Architecture Overview
The entire model is defined in a single `model.py` file implementing a standard decoder-only transformer with causal self-attention, GELU activations, and optional Flash Attention. Training logic lives in `train.py` using PyTorch DDP for multi-GPU and mixed precision via torch.amp. Configuration is pure Python files that override defaults.

## Self-Hosting & Configuration
- Requires Python 3.10+ and PyTorch 2.0+
- Data preparation scripts convert raw text into memory-mapped binary token arrays
- Config files set model size, learning rate, batch size, and device count
- Supports single GPU, multi-GPU DDP, and Apple MPS backends
- Weights & Biases integration is optional via the `--wandb_log` flag

## Key Features
- Entire training codebase fits in roughly 300 lines of Python
- Reproduces published GPT-2 results at research-grade quality
- Flash Attention support via PyTorch 2.0 scaled_dot_product_attention
- Sampling script generates text from trained checkpoints immediately
- Clean separation of data prep, training, and inference stages

## Comparison with Similar Tools
- **Hugging Face Transformers** — full-featured library with thousands of models; nanoGPT is purpose-built for learning and small experiments
- **Megatron-LM** — NVIDIA's large-scale training framework; far more complex, targets multi-node clusters
- **LitGPT** — Lightning-based GPT training; adds configuration abstractions nanoGPT deliberately avoids
- **minGPT** — Karpathy's earlier project; nanoGPT is the faster, more optimized successor

## FAQ
**Q: Can I train a production-grade LLM with nanoGPT?**
A: It is optimized for learning and reproducing GPT-2. For production-scale training, frameworks like Megatron-LM or LLaMA-Factory are more appropriate.

**Q: What hardware do I need?**
A: A single consumer GPU with 8 GB VRAM can train the character-level Shakespeare model. Reproducing GPT-2 124M requires multiple A100s.

**Q: Does it support LoRA or adapter-based fine-tuning?**
A: Not natively. The codebase does full-parameter fine-tuning. Community forks add PEFT methods.

**Q: Is the code actively maintained?**
A: The repository is intentionally minimal and stable. Updates are infrequent by design.

## Sources
- https://github.com/karpathy/nanoGPT
- https://github.com/karpathy/nanoGPT/blob/master/README.md

---
Source: https://tokrepo.com/en/workflows/nanogpt-simplest-repository-training-medium-sized-gpts-63ba44df
Author: Script Depot