Introduction
nanochat is an open-source project by Andrej Karpathy that demonstrates building a functional chatbot for under $100 in compute costs. It serves as both an educational resource and a practical starting point for training small language models with modern techniques.
What nanochat Does
- Trains a capable chatbot model from scratch on consumer hardware
- Implements efficient training techniques that minimize compute requirements
- Provides a complete pipeline from data preparation to interactive chat inference
- Includes instruction tuning and RLHF-style alignment on a budget
- Offers a reference implementation for understanding LLM training internals
Architecture Overview
nanochat implements a transformer-based language model with a streamlined training pipeline. It uses a custom data loading system optimized for small-scale training, mixed-precision training with gradient accumulation, and a multi-stage pipeline covering pretraining, supervised fine-tuning, and preference optimization. The codebase is intentionally minimal to serve as a readable reference.
Self-Hosting & Configuration
- Requires Python 3.10+ with PyTorch and a CUDA-capable GPU (RTX 3090 or better recommended)
- Training configs are YAML files specifying model size, data paths, and hyperparameters
- Pretrained checkpoints are available for skipping the pretraining phase
- Inference runs on consumer GPUs or CPUs (slower) for interactive chat
- No cloud dependencies; the entire pipeline runs on a single machine
Key Features
- Complete LLM training pipeline in a minimal, readable codebase
- Budget-friendly: full training from scratch costs under $100 in GPU compute
- Multi-stage training covering pretraining, SFT, and preference optimization
- Educational code with clear documentation explaining each component
- Checkpoint compatibility with common inference frameworks for deployment
Comparison with Similar Tools
- minimind — similar educational LLM trainer; nanochat includes alignment and chat-specific training stages
- nanoGPT — Karpathy's earlier project for pretraining only; nanochat extends to full chat model training
- llama.cpp — inference-focused; nanochat covers the training side of the pipeline
- Axolotl — fine-tuning toolkit; nanochat provides the full training stack from scratch
FAQ
Q: What GPU is needed for training? A: An RTX 3090 or 4090 is sufficient for the default model configuration.
Q: Can I use my own training data? A: Yes. The data pipeline accepts JSONL formatted conversation data.
Q: How does the output quality compare to commercial models? A: nanochat produces a capable conversational model, though it does not match frontier models trained on much larger budgets.
Q: Is this suitable for production deployment? A: nanochat is primarily educational. For production, consider fine-tuning a larger pretrained model.