ScriptsJun 2, 2026·3 min read

Pi AutoResearch — Autonomous Experiment Loop for AI Agents

An extension that enables AI agents to run autonomous research loops — formulating hypotheses, designing experiments, executing code, analyzing results, and iterating without human intervention.

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Pi AutoResearch Overview
Direct install command
npx -y tokrepo@latest install a0209cf4-5e7d-11f1-9bc6-00163e2b0d79 --target codex

Run after dry-run confirms the install plan.

Introduction

Pi AutoResearch adds an autonomous experiment loop to AI coding agents. Given a research question or hypothesis, the agent designs experiments, writes and executes code, collects metrics, and iterates on its approach — all without requiring human approval at each step. It is designed for ML researchers and data scientists who want to accelerate exploratory work.

What Pi AutoResearch Does

  • Decomposes research questions into testable hypotheses
  • Generates experiment code with proper controls and metrics
  • Executes experiments in sandboxed environments and collects results
  • Analyzes outcomes and decides whether to refine, pivot, or conclude
  • Produces structured research reports with reproducible notebooks

Architecture Overview

Pi AutoResearch operates as a TypeScript extension that wraps a coding agent in an experiment loop controller. The controller maintains a state machine with phases: hypothesis formulation, experiment design, execution, analysis, and decision. Each phase invokes the underlying agent with structured prompts. Execution happens in isolated containers to prevent side effects. Results are stored in a local SQLite database for cross-experiment comparison.

Self-Hosting & Configuration

  • Requires Node.js 18+ and Docker for sandboxed experiment execution
  • Configure via autoresearch.config.json for model provider, iteration limits, and resource budgets
  • Set compute constraints (max CPU time, memory, GPU) per experiment run
  • Supports integration with MLflow or Weights and Biases for experiment tracking
  • All data stays local unless external tracking services are configured

Key Features

  • Fully autonomous hypothesis-test-iterate loop
  • Sandboxed execution prevents experiments from affecting the host system
  • Structured decision framework for when to continue, pivot, or stop
  • Built-in experiment comparison across iterations
  • Exportable Jupyter notebooks for reproducibility

Comparison with Similar Tools

  • AutoGen — general multi-agent framework; Pi AutoResearch specializes in the experiment loop pattern
  • DSPy — optimizes LLM programs; Pi AutoResearch runs open-ended experimental research
  • Kedro — ML pipeline framework; Pi AutoResearch focuses on autonomous exploration, not production pipelines
  • Jupyter — interactive notebooks; Pi AutoResearch automates the entire experiment cycle

FAQ

Q: What types of experiments can it run? A: Any experiment expressible as Python or TypeScript code — ML training runs, data analysis, benchmarking, API testing, and statistical simulations.

Q: How does it decide when to stop? A: The controller uses configurable stopping criteria: maximum iterations, convergence thresholds, or budget limits on compute time and API cost.

Q: Can I review experiments before they execute? A: Yes, a --review flag pauses before each execution for human approval, useful when running expensive GPU experiments.

Q: Does it support GPU workloads? A: Yes, Docker containers can be configured with GPU passthrough for ML training experiments.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets