ConfigsMay 13, 2026·3 min read

Autoresearch — Automated AI Research Agents by Karpathy

An open-source system by Andrej Karpathy that uses AI agents to autonomously run machine learning research experiments on single-GPU setups.

Introduction

Autoresearch is an open-source project by Andrej Karpathy that automates machine learning research using AI agents. It enables researchers to define experiment hypotheses and let autonomous agents handle implementation, training, evaluation, and reporting on commodity single-GPU hardware.

What Autoresearch Does

  • Autonomously designs and runs ML training experiments based on high-level hypotheses
  • Generates code, trains models, and evaluates results without manual intervention
  • Produces structured research reports with metrics, charts, and analysis
  • Iterates on experiments by analyzing prior results and proposing follow-ups
  • Operates on single-GPU setups, making research accessible without large compute clusters

Architecture Overview

Autoresearch uses an agentic loop where an LLM-powered research agent receives a hypothesis, writes training code (based on nanochat or similar frameworks), executes it in a sandboxed environment, collects metrics, and decides whether to iterate or conclude. Results are logged in a structured format for human review.

Self-Hosting & Configuration

  • Requires Python 3.10+ and a CUDA-capable GPU for training experiments
  • Configure the LLM provider (OpenAI, Anthropic, or local models) via environment variables
  • Experiment configs are YAML files defining hypotheses, compute budgets, and evaluation criteria
  • Results are stored locally in a structured directory with logs, checkpoints, and reports
  • No cloud dependencies required; runs entirely on local hardware

Key Features

  • Fully autonomous research loop from hypothesis to published results
  • Built on nanochat for efficient small-model training experiments
  • Structured output format enables easy comparison across experiment runs
  • Cost-aware: designed for single-GPU research rather than large cluster workloads
  • Extensible agent architecture supports custom evaluation metrics and training frameworks

Comparison with Similar Tools

  • OpenAI Research Tools — proprietary; Autoresearch is fully open source and local-first
  • MLflow — tracks experiments; Autoresearch autonomously designs and runs them
  • AutoML (AutoGluon, FLAML) — optimizes hyperparameters; Autoresearch explores research hypotheses
  • DeerFlow — general-purpose research agent; Autoresearch is specialized for ML training experiments

FAQ

Q: What GPU is required? A: A single consumer GPU (e.g., RTX 3090 or 4090) is sufficient for the default experiment configs.

Q: Can I use local LLMs instead of API-based models? A: Yes. Autoresearch supports local model inference through compatible backends.

Q: Does it only work with nanochat-style training? A: The default setup uses nanochat, but the agent architecture supports custom training scripts.

Q: How long does a typical research run take? A: Depending on the hypothesis complexity and compute budget, runs typically take 2-8 hours.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets