Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 13, 2026·3 min de lecture

nanochat — Affordable Open-Source ChatGPT by Karpathy

An open-source project by Andrej Karpathy demonstrating how to build a capable chatbot for under $100 in compute, using efficient training techniques on small models.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
nanochat Overview
Commande CLI universelle
npx tokrepo install cf04e473-4f09-11f1-9bc6-00163e2b0d79

Introduction

nanochat is an open-source project by Andrej Karpathy that demonstrates building a functional chatbot for under $100 in compute costs. It serves as both an educational resource and a practical starting point for training small language models with modern techniques.

What nanochat Does

  • Trains a capable chatbot model from scratch on consumer hardware
  • Implements efficient training techniques that minimize compute requirements
  • Provides a complete pipeline from data preparation to interactive chat inference
  • Includes instruction tuning and RLHF-style alignment on a budget
  • Offers a reference implementation for understanding LLM training internals

Architecture Overview

nanochat implements a transformer-based language model with a streamlined training pipeline. It uses a custom data loading system optimized for small-scale training, mixed-precision training with gradient accumulation, and a multi-stage pipeline covering pretraining, supervised fine-tuning, and preference optimization. The codebase is intentionally minimal to serve as a readable reference.

Self-Hosting & Configuration

  • Requires Python 3.10+ with PyTorch and a CUDA-capable GPU (RTX 3090 or better recommended)
  • Training configs are YAML files specifying model size, data paths, and hyperparameters
  • Pretrained checkpoints are available for skipping the pretraining phase
  • Inference runs on consumer GPUs or CPUs (slower) for interactive chat
  • No cloud dependencies; the entire pipeline runs on a single machine

Key Features

  • Complete LLM training pipeline in a minimal, readable codebase
  • Budget-friendly: full training from scratch costs under $100 in GPU compute
  • Multi-stage training covering pretraining, SFT, and preference optimization
  • Educational code with clear documentation explaining each component
  • Checkpoint compatibility with common inference frameworks for deployment

Comparison with Similar Tools

  • minimind — similar educational LLM trainer; nanochat includes alignment and chat-specific training stages
  • nanoGPT — Karpathy's earlier project for pretraining only; nanochat extends to full chat model training
  • llama.cpp — inference-focused; nanochat covers the training side of the pipeline
  • Axolotl — fine-tuning toolkit; nanochat provides the full training stack from scratch

FAQ

Q: What GPU is needed for training? A: An RTX 3090 or 4090 is sufficient for the default model configuration.

Q: Can I use my own training data? A: Yes. The data pipeline accepts JSONL formatted conversation data.

Q: How does the output quality compare to commercial models? A: nanochat produces a capable conversational model, though it does not match frontier models trained on much larger budgets.

Q: Is this suitable for production deployment? A: nanochat is primarily educational. For production, consider fine-tuning a larger pretrained model.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires