ScriptsApr 2, 2026·2 min read

Stagehand — AI Browser Automation Framework

Three AI primitives — act(), extract(), observe() — to automate any website with natural language. By Browserbase. 21K+ stars.

TL;DR
Stagehand uses act(), extract(), and observe() to automate any website using natural language instructions.
§01

What it is

Stagehand is an AI browser automation framework built by Browserbase. It exposes three primitives -- act(), extract(), and observe() -- that let you automate web interactions using natural language instead of CSS selectors or XPaths. Behind the scenes, Stagehand uses vision models to understand the page and execute actions.

Stagehand is designed for developers building web scrapers, testing frameworks, or automation workflows who want to write instructions in plain English rather than brittle selector-based scripts. It runs locally or on Browserbase's cloud infrastructure.

§02

How it saves time or tokens

Traditional browser automation with Puppeteer or Playwright requires writing and maintaining CSS selectors that break when the page layout changes. Stagehand's natural language approach is resilient to UI changes because it uses visual understanding rather than DOM structure.

The three-primitive API keeps the learning curve minimal. Instead of learning a complex automation framework, you write act('click the login button') and Stagehand handles the rest.

§03

How to use

  1. Install Stagehand:
npm install @browserbasehq/stagehand
  1. Create an automation script:
import { Stagehand } from '@browserbasehq/stagehand';

const stagehand = new Stagehand({
  env: 'LOCAL',
  modelName: 'gpt-4o',
  modelClientOptions: { apiKey: process.env.OPENAI_API_KEY }
});

await stagehand.init();
await stagehand.page.goto('https://example.com');
await stagehand.act({ action: 'click the sign up button' });
  1. Run your script with Node.js.
§04

Example

import { Stagehand } from '@browserbasehq/stagehand';

const stagehand = new Stagehand({ env: 'LOCAL' });
await stagehand.init();
await stagehand.page.goto('https://news.ycombinator.com');

// Extract structured data from the page
const stories = await stagehand.extract({
  instruction: 'Extract the title and URL of the top 5 stories',
  schema: {
    type: 'array',
    items: {
      type: 'object',
      properties: {
        title: { type: 'string' },
        url: { type: 'string' }
      }
    }
  }
});

console.log(stories);
await stagehand.close();

This extracts structured data from Hacker News using natural language instructions and a JSON schema for the output format.

§05

Related on TokRepo

§06

Common pitfalls

  • Stagehand requires an LLM API key (OpenAI or Anthropic). Each action sends a screenshot to the vision model, which costs tokens. High-frequency automation scripts can accumulate significant API costs.
  • The LOCAL env mode requires a Chromium browser installed on the machine. If Chromium is missing, Stagehand will fail to initialize. Use Browserbase's cloud mode to avoid local browser management.
  • Natural language instructions must be specific. Vague instructions like 'fill out the form' may produce unexpected results. Write precise actions like 'type john@example.com into the email field'.

Frequently Asked Questions

What LLM models does Stagehand support?+

Stagehand supports OpenAI models (GPT-4o, GPT-4o-mini) and Anthropic models (Claude Sonnet). The model is configured when initializing the Stagehand instance. Vision-capable models are required because Stagehand sends page screenshots for visual understanding.

Can Stagehand replace Playwright or Puppeteer?+

Stagehand complements rather than replaces traditional automation tools. It uses Playwright under the hood for browser control. For stable pages with predictable selectors, Playwright is faster and cheaper. Stagehand is better for dynamic pages or when you want resilience to UI changes.

How does Stagehand handle authentication?+

Stagehand can perform login flows using act() to type credentials and click buttons. For repeated automation runs, you can persist browser cookies and session storage to avoid logging in each time. Stagehand exposes the underlying Playwright page object for cookie management.

What is the difference between LOCAL and BROWSERBASE modes?+

LOCAL mode runs a Chromium browser on your machine. BROWSERBASE mode runs the browser on Browserbase's cloud infrastructure, which handles browser lifecycle, proxy rotation, and captcha solving. BROWSERBASE requires a Browserbase API key and account.

Can I use Stagehand for web testing?+

Yes. Stagehand's observe() primitive can verify page state, and extract() can check for specific content. However, traditional testing frameworks like Playwright Test or Cypress are more suited for deterministic test assertions. Stagehand is better for exploratory testing and dynamic page validation.

Citations (3)
🙏

Source & Thanks

Created by Browserbase. Licensed under MIT.

stagehand — ⭐ 21,800+

Thanks to the Browserbase team for creating the most elegant API for AI browser automation.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets