Stagehand — AI Browser Automation Framework
Three AI primitives — act(), extract(), observe() — to automate any website with natural language. By Browserbase. 21K+ stars.
What it is
Stagehand is an AI browser automation framework built by Browserbase. It exposes three primitives -- act(), extract(), and observe() -- that let you automate web interactions using natural language instead of CSS selectors or XPaths. Behind the scenes, Stagehand uses vision models to understand the page and execute actions.
Stagehand is designed for developers building web scrapers, testing frameworks, or automation workflows who want to write instructions in plain English rather than brittle selector-based scripts. It runs locally or on Browserbase's cloud infrastructure.
How it saves time or tokens
Traditional browser automation with Puppeteer or Playwright requires writing and maintaining CSS selectors that break when the page layout changes. Stagehand's natural language approach is resilient to UI changes because it uses visual understanding rather than DOM structure.
The three-primitive API keeps the learning curve minimal. Instead of learning a complex automation framework, you write act('click the login button') and Stagehand handles the rest.
How to use
- Install Stagehand:
npm install @browserbasehq/stagehand
- Create an automation script:
import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({
env: 'LOCAL',
modelName: 'gpt-4o',
modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }
});
await stagehand.init();
await stagehand.page.goto('https://example.com');
await stagehand.act({ action: 'click the sign up button' });
- Run your script with Node.js.
Example
import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({ env: 'LOCAL' });
await stagehand.init();
await stagehand.page.goto('https://news.ycombinator.com');
// Extract structured data from the page
const stories = await stagehand.extract({
instruction: 'Extract the title and URL of the top 5 stories',
schema: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
url: { type: 'string' }
}
}
}
});
console.log(stories);
await stagehand.close();
This extracts structured data from Hacker News using natural language instructions and a JSON schema for the output format.
Related on TokRepo
- Browser automation tools -- Compare other AI-powered browser automation frameworks
- Web scraping tools -- Explore tools for extracting data from websites
Common pitfalls
- Stagehand requires an LLM API key (OpenAI or Anthropic). Each action sends a screenshot to the vision model, which costs tokens. High-frequency automation scripts can accumulate significant API costs.
- The LOCAL env mode requires a Chromium browser installed on the machine. If Chromium is missing, Stagehand will fail to initialize. Use Browserbase's cloud mode to avoid local browser management.
- Natural language instructions must be specific. Vague instructions like 'fill out the form' may produce unexpected results. Write precise actions like 'type john@example.com into the email field'.
Frequently Asked Questions
Stagehand supports OpenAI models (GPT-4o, GPT-4o-mini) and Anthropic models (Claude Sonnet). The model is configured when initializing the Stagehand instance. Vision-capable models are required because Stagehand sends page screenshots for visual understanding.
Stagehand complements rather than replaces traditional automation tools. It uses Playwright under the hood for browser control. For stable pages with predictable selectors, Playwright is faster and cheaper. Stagehand is better for dynamic pages or when you want resilience to UI changes.
Stagehand can perform login flows using act() to type credentials and click buttons. For repeated automation runs, you can persist browser cookies and session storage to avoid logging in each time. Stagehand exposes the underlying Playwright page object for cookie management.
LOCAL mode runs a Chromium browser on your machine. BROWSERBASE mode runs the browser on Browserbase's cloud infrastructure, which handles browser lifecycle, proxy rotation, and captcha solving. BROWSERBASE requires a Browserbase API key and account.
Yes. Stagehand's observe() primitive can verify page state, and extract() can check for specific content. However, traditional testing frameworks like Playwright Test or Cypress are more suited for deterministic test assertions. Stagehand is better for exploratory testing and dynamic page validation.
Citations (3)
- Stagehand GitHub— Stagehand uses three AI primitives: act, extract, observe
- Browserbase— Built by Browserbase for AI-native browser automation
- Stagehand Documentation— Uses vision models for page understanding
Related on TokRepo
Source & Thanks
Created by Browserbase. Licensed under MIT.
stagehand — ⭐ 21,800+
Thanks to the Browserbase team for creating the most elegant API for AI browser automation.
Discussion
Related Assets
Moodle — Open-Source Learning Management System
The most widely used open-source learning platform, providing course management, assessments, and collaboration tools for educators and organizations worldwide.
Sylius — Headless E-Commerce Framework on Symfony
An open-source headless e-commerce platform built on Symfony and API Platform, designed for developers who need a customizable and API-first commerce solution.
Akaunting — Free Self-Hosted Accounting Software
A free, open-source online accounting application built on Laravel for small businesses and freelancers to manage invoices, expenses, and financial reports.