How do I install Autoresearch — Automated AI Research Agents by Karpathy?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Autoresearch — Automated AI Research Agents by Karpathy

Introduction

Autoresearch is an open-source project by Andrej Karpathy that automates machine learning research using AI agents. It enables researchers to define experiment hypotheses and let autonomous agents handle implementation, training, evaluation, and reporting on commodity single-GPU hardware.

What Autoresearch Does

Autonomously designs and runs ML training experiments based on high-level hypotheses
Generates code, trains models, and evaluates results without manual intervention
Produces structured research reports with metrics, charts, and analysis
Iterates on experiments by analyzing prior results and proposing follow-ups
Operates on single-GPU setups, making research accessible without large compute clusters

Architecture Overview

Autoresearch uses an agentic loop where an LLM-powered research agent receives a hypothesis, writes training code (based on nanochat or similar frameworks), executes it in a sandboxed environment, collects metrics, and decides whether to iterate or conclude. Results are logged in a structured format for human review.

Self-Hosting & Configuration

Requires Python 3.10+ and a CUDA-capable GPU for training experiments
Configure the LLM provider (OpenAI, Anthropic, or local models) via environment variables
Experiment configs are YAML files defining hypotheses, compute budgets, and evaluation criteria
Results are stored locally in a structured directory with logs, checkpoints, and reports
No cloud dependencies required; runs entirely on local hardware

Key Features

Fully autonomous research loop from hypothesis to published results
Built on nanochat for efficient small-model training experiments
Structured output format enables easy comparison across experiment runs
Cost-aware: designed for single-GPU research rather than large cluster workloads
Extensible agent architecture supports custom evaluation metrics and training frameworks

Comparison with Similar Tools

OpenAI Research Tools — proprietary; Autoresearch is fully open source and local-first
MLflow — tracks experiments; Autoresearch autonomously designs and runs them
AutoML (AutoGluon, FLAML) — optimizes hyperparameters; Autoresearch explores research hypotheses
DeerFlow — general-purpose research agent; Autoresearch is specialized for ML training experiments

FAQ

Q: What GPU is required? A: A single consumer GPU (e.g., RTX 3090 or 4090) is sufficient for the default experiment configs.

Q: Can I use local LLMs instead of API-based models? A: Yes. Autoresearch supports local model inference through compatible backends.

Q: Does it only work with nanochat-style training? A: The default setup uses nanochat, but the agent architecture supports custom training scripts.

Q: How long does a typical research run take? A: Depending on the hypothesis complexity and compute budget, runs typically take 2-8 hours.

Autoresearch — Automated AI Research Agents by Karpathy

Introduction

What Autoresearch Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

TPOT — Automated Machine Learning with Genetic Programming

Packer — Automated Machine Image Building for Any Platform

AI Scientist — Automated Research Paper Generation

Keel — Automated Kubernetes Workload Updates