Introduction
micrograd is an extremely small autograd engine that implements backpropagation over a dynamically built computation graph. Created by Andrej Karpathy, it demonstrates how frameworks like PyTorch compute gradients under the hood, making it one of the most popular educational tools for understanding deep learning internals.
What micrograd Does
- Implements reverse-mode automatic differentiation in pure Python
- Builds a dynamic computation graph of scalar-valued operations
- Supports forward pass computation and backward pass gradient calculation
- Includes a small neural network library built on top of the engine
- Provides a complete example of training a binary classifier
Architecture Overview
The core is a Value class that wraps a scalar float and tracks the operations applied to it. Each operation records its inputs and a local gradient function. Calling backward() on a final Value triggers topological sort of the graph followed by reverse-mode gradient accumulation. A simple neural network module (Neuron, Layer, MLP) is built on top using these Value objects.
Self-Hosting & Configuration
- Install from PyPI:
pip install micrograd - Or clone the repo for the full source and notebooks
- No configuration needed; the entire library is two files
- Run the included Jupyter notebook for a visual walkthrough
- Requires only Python 3.6+ with no external dependencies
Key Features
- Entire autograd engine fits in about 150 lines of code
- Dynamic computation graph like PyTorch (define-by-run)
- Clean implementation of backpropagation with topological sort
- Includes MLP implementation for classification tasks
- Accompanied by a detailed YouTube lecture with 10M+ views
Comparison with Similar Tools
- PyTorch — full production framework with GPU support; micrograd is a teaching tool
- JAX — functional autodiff with JIT compilation; micrograd is imperative and minimal
- Tinygrad — small but GPU-capable framework; micrograd is CPU-only and simpler
- Autograd (Harvard) — Python autograd for NumPy; micrograd operates on individual scalars
FAQ
Q: Can micrograd train real neural networks? A: It can train small networks on toy datasets. It operates on scalars, not tensors, so it is too slow for real workloads.
Q: Does it support GPU acceleration? A: No, it is pure Python operating on scalar values. It is meant for understanding, not performance.
Q: What is the best way to learn from micrograd? A: Watch the accompanying YouTube lecture, then read the engine.py source code line by line.
Q: Can I extend micrograd with new operations? A: Yes, adding new ops requires defining the forward computation and the local gradient in the Value class.