Stagehand — AI Browser Automation Framework
Three AI primitives — act(), extract(), observe() — to automate any website with natural language. By Browserbase. 21K+ stars.
Staging sûr pour cet actif
Cet actif est d'abord staged. Le prompt copié demande à l'agent d'inspecter les fichiers staged avant d'activer scripts, config MCP ou config globale.
npx -y tokrepo@latest install 5114a013-a144-4020-8611-c38b74968b99 --target codexStage les fichiers d'abord; l'activation exige la revue du README et du plan staged.
What it is
Stagehand is an AI browser automation framework built by Browserbase. It exposes three primitives -- act(), extract(), and observe() -- that let you automate web interactions using natural language instead of CSS selectors or XPaths. Behind the scenes, Stagehand uses vision models to understand the page and execute actions.
Stagehand is designed for developers building web scrapers, testing frameworks, or automation workflows who want to write instructions in plain English rather than brittle selector-based scripts. It runs locally or on Browserbase's cloud infrastructure.
How it saves time or tokens
Traditional browser automation with Puppeteer or Playwright requires writing and maintaining CSS selectors that break when the page layout changes. Stagehand's natural language approach is resilient to UI changes because it uses visual understanding rather than DOM structure.
The three-primitive API keeps the learning curve minimal. Instead of learning a complex automation framework, you write act('click the login button') and Stagehand handles the rest.
How to use
- Install Stagehand:
npm install @browserbasehq/stagehand
- Create an automation script:
import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({
env: 'LOCAL',
modelName: 'gpt-4o',
modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }
});
await stagehand.init();
await stagehand.page.goto('https://example.com');
await stagehand.act({ action: 'click the sign up button' });
- Run your script with Node.js.
Example
import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({ env: 'LOCAL' });
await stagehand.init();
await stagehand.page.goto('https://news.ycombinator.com');
// Extract structured data from the page
const stories = await stagehand.extract({
instruction: 'Extract the title and URL of the top 5 stories',
schema: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
url: { type: 'string' }
}
}
}
});
console.log(stories);
await stagehand.close();
This extracts structured data from Hacker News using natural language instructions and a JSON schema for the output format.
Related on TokRepo
- Browser automation tools -- Compare other AI-powered browser automation frameworks
- Web scraping tools -- Explore tools for extracting data from websites
Common pitfalls
- Stagehand requires an LLM API key (OpenAI or Anthropic). Each action sends a screenshot to the vision model, which costs tokens. High-frequency automation scripts can accumulate significant API costs.
- The LOCAL env mode requires a Chromium browser installed on the machine. If Chromium is missing, Stagehand will fail to initialize. Use Browserbase's cloud mode to avoid local browser management.
- Natural language instructions must be specific. Vague instructions like 'fill out the form' may produce unexpected results. Write precise actions like 'type john@example.com into the email field'.
Questions fréquentes
Stagehand supports OpenAI models (GPT-4o, GPT-4o-mini) and Anthropic models (Claude Sonnet). The model is configured when initializing the Stagehand instance. Vision-capable models are required because Stagehand sends page screenshots for visual understanding.
Stagehand complements rather than replaces traditional automation tools. It uses Playwright under the hood for browser control. For stable pages with predictable selectors, Playwright is faster and cheaper. Stagehand is better for dynamic pages or when you want resilience to UI changes.
Stagehand can perform login flows using act() to type credentials and click buttons. For repeated automation runs, you can persist browser cookies and session storage to avoid logging in each time. Stagehand exposes the underlying Playwright page object for cookie management.
LOCAL mode runs a Chromium browser on your machine. BROWSERBASE mode runs the browser on Browserbase's cloud infrastructure, which handles browser lifecycle, proxy rotation, and captcha solving. BROWSERBASE requires a Browserbase API key and account.
Yes. Stagehand's observe() primitive can verify page state, and extract() can check for specific content. However, traditional testing frameworks like Playwright Test or Cypress are more suited for deterministic test assertions. Stagehand is better for exploratory testing and dynamic page validation.
Sources citées (3)
- Stagehand GitHub— Stagehand uses three AI primitives: act, extract, observe
- Browserbase— Built by Browserbase for AI-native browser automation
- Stagehand Documentation— Uses vision models for page understanding
En lien sur TokRepo
Source et remerciements
Created by Browserbase. Licensed under MIT.
stagehand — ⭐ 21,800+
Thanks to the Browserbase team for creating the most elegant API for AI browser automation.
Fil de discussion
Actifs similaires
Stagehand — AI-Powered Browser Automation SDK
TypeScript SDK that lets you automate browsers using natural language and visual understanding. AI sees the page like a human does. Built on Playwright. 10,000+ GitHub stars.
Selenium — Browser Automation Framework and Ecosystem
Selenium is the original browser automation framework for testing web applications. WebDriver API supports Chrome, Firefox, Safari, Edge across Java, Python, C#, Ruby, JavaScript. The industry standard for E2E web testing since 2004.
Browser Use — AI Agent Browser Automation
Let AI agents control web browsers with natural language. Browser Use provides vision-based element detection, multi-tab support, and works with any LLM provider.
Browserbase MCP — Cloud Browser Automation Tools
Browserbase MCP server exposes automation tools (navigate, act, observe, extract) backed by Browserbase + Stagehand, letting agents operate real web pages.