Introduction
Pi AutoResearch adds an autonomous experiment loop to AI coding agents. Given a research question or hypothesis, the agent designs experiments, writes and executes code, collects metrics, and iterates on its approach — all without requiring human approval at each step. It is designed for ML researchers and data scientists who want to accelerate exploratory work.
What Pi AutoResearch Does
- Decomposes research questions into testable hypotheses
- Generates experiment code with proper controls and metrics
- Executes experiments in sandboxed environments and collects results
- Analyzes outcomes and decides whether to refine, pivot, or conclude
- Produces structured research reports with reproducible notebooks
Architecture Overview
Pi AutoResearch operates as a TypeScript extension that wraps a coding agent in an experiment loop controller. The controller maintains a state machine with phases: hypothesis formulation, experiment design, execution, analysis, and decision. Each phase invokes the underlying agent with structured prompts. Execution happens in isolated containers to prevent side effects. Results are stored in a local SQLite database for cross-experiment comparison.
Self-Hosting & Configuration
- Requires Node.js 18+ and Docker for sandboxed experiment execution
- Configure via
autoresearch.config.jsonfor model provider, iteration limits, and resource budgets - Set compute constraints (max CPU time, memory, GPU) per experiment run
- Supports integration with MLflow or Weights and Biases for experiment tracking
- All data stays local unless external tracking services are configured
Key Features
- Fully autonomous hypothesis-test-iterate loop
- Sandboxed execution prevents experiments from affecting the host system
- Structured decision framework for when to continue, pivot, or stop
- Built-in experiment comparison across iterations
- Exportable Jupyter notebooks for reproducibility
Comparison with Similar Tools
- AutoGen — general multi-agent framework; Pi AutoResearch specializes in the experiment loop pattern
- DSPy — optimizes LLM programs; Pi AutoResearch runs open-ended experimental research
- Kedro — ML pipeline framework; Pi AutoResearch focuses on autonomous exploration, not production pipelines
- Jupyter — interactive notebooks; Pi AutoResearch automates the entire experiment cycle
FAQ
Q: What types of experiments can it run? A: Any experiment expressible as Python or TypeScript code — ML training runs, data analysis, benchmarking, API testing, and statistical simulations.
Q: How does it decide when to stop? A: The controller uses configurable stopping criteria: maximum iterations, convergence thresholds, or budget limits on compute time and API cost.
Q: Can I review experiments before they execute?
A: Yes, a --review flag pauses before each execution for human approval, useful when running expensive GPU experiments.
Q: Does it support GPU workloads? A: Yes, Docker containers can be configured with GPU passthrough for ML training experiments.