Stagehand — AI Browser Automation Framework
Three AI primitives — act(), extract(), observe() — to automate any website with natural language. By Browserbase. 21K+ stars.
这个资产会安全暂存
这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件,并在激活脚本、MCP 配置或全局配置前先确认。
npx -y tokrepo@latest install 5114a013-a144-4020-8611-c38b74968b99 --target codex先暂存文件;激活前需要读取暂存 README 和安装计划。
What it is
Stagehand is an AI browser automation framework built by Browserbase. It exposes three primitives -- act(), extract(), and observe() -- that let you automate web interactions using natural language instead of CSS selectors or XPaths. Behind the scenes, Stagehand uses vision models to understand the page and execute actions.
Stagehand is designed for developers building web scrapers, testing frameworks, or automation workflows who want to write instructions in plain English rather than brittle selector-based scripts. It runs locally or on Browserbase's cloud infrastructure.
How it saves time or tokens
Traditional browser automation with Puppeteer or Playwright requires writing and maintaining CSS selectors that break when the page layout changes. Stagehand's natural language approach is resilient to UI changes because it uses visual understanding rather than DOM structure.
The three-primitive API keeps the learning curve minimal. Instead of learning a complex automation framework, you write act('click the login button') and Stagehand handles the rest.
How to use
- Install Stagehand:
npm install @browserbasehq/stagehand
- Create an automation script:
import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({
env: 'LOCAL',
modelName: 'gpt-4o',
modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }
});
await stagehand.init();
await stagehand.page.goto('https://example.com');
await stagehand.act({ action: 'click the sign up button' });
- Run your script with Node.js.
Example
import { Stagehand } from '@browserbasehq/stagehand';
const stagehand = new Stagehand({ env: 'LOCAL' });
await stagehand.init();
await stagehand.page.goto('https://news.ycombinator.com');
// Extract structured data from the page
const stories = await stagehand.extract({
instruction: 'Extract the title and URL of the top 5 stories',
schema: {
type: 'array',
items: {
type: 'object',
properties: {
title: { type: 'string' },
url: { type: 'string' }
}
}
}
});
console.log(stories);
await stagehand.close();
This extracts structured data from Hacker News using natural language instructions and a JSON schema for the output format.
Related on TokRepo
- Browser automation tools -- Compare other AI-powered browser automation frameworks
- Web scraping tools -- Explore tools for extracting data from websites
Common pitfalls
- Stagehand requires an LLM API key (OpenAI or Anthropic). Each action sends a screenshot to the vision model, which costs tokens. High-frequency automation scripts can accumulate significant API costs.
- The LOCAL env mode requires a Chromium browser installed on the machine. If Chromium is missing, Stagehand will fail to initialize. Use Browserbase's cloud mode to avoid local browser management.
- Natural language instructions must be specific. Vague instructions like 'fill out the form' may produce unexpected results. Write precise actions like 'type john@example.com into the email field'.
常见问题
Stagehand supports OpenAI models (GPT-4o, GPT-4o-mini) and Anthropic models (Claude Sonnet). The model is configured when initializing the Stagehand instance. Vision-capable models are required because Stagehand sends page screenshots for visual understanding.
Stagehand complements rather than replaces traditional automation tools. It uses Playwright under the hood for browser control. For stable pages with predictable selectors, Playwright is faster and cheaper. Stagehand is better for dynamic pages or when you want resilience to UI changes.
Stagehand can perform login flows using act() to type credentials and click buttons. For repeated automation runs, you can persist browser cookies and session storage to avoid logging in each time. Stagehand exposes the underlying Playwright page object for cookie management.
LOCAL mode runs a Chromium browser on your machine. BROWSERBASE mode runs the browser on Browserbase's cloud infrastructure, which handles browser lifecycle, proxy rotation, and captcha solving. BROWSERBASE requires a Browserbase API key and account.
Yes. Stagehand's observe() primitive can verify page state, and extract() can check for specific content. However, traditional testing frameworks like Playwright Test or Cypress are more suited for deterministic test assertions. Stagehand is better for exploratory testing and dynamic page validation.
引用来源 (3)
- Stagehand GitHub— Stagehand uses three AI primitives: act, extract, observe
- Browserbase— Built by Browserbase for AI-native browser automation
- Stagehand Documentation— Uses vision models for page understanding
来源与感谢
Created by Browserbase. Licensed under MIT.
stagehand — ⭐ 21,800+
Thanks to the Browserbase team for creating the most elegant API for AI browser automation.
讨论
相关资产
Stagehand — AI-Powered Browser Automation SDK
TypeScript SDK that lets you automate browsers using natural language and visual understanding. AI sees the page like a human does. Built on Playwright. 10,000+ GitHub stars.
Selenium — Browser Automation Framework and Ecosystem
Selenium is the original browser automation framework for testing web applications. WebDriver API supports Chrome, Firefox, Safari, Edge across Java, Python, C#, Ruby, JavaScript. The industry standard for E2E web testing since 2004.
Browser Use — AI Agent Browser Automation
Let AI agents control web browsers with natural language. Browser Use provides vision-based element detection, multi-tab support, and works with any LLM provider.
Browserbase MCP — Cloud Browser Automation Tools
Browserbase MCP server exposes automation tools (navigate, act, observe, extract) backed by Browserbase + Stagehand, letting agents operate real web pages.