ScriptsMar 28, 2026·2 min read

Browser Use — AI Browser Automation

Open-source Python library for AI-driven browser automation. Works with Claude, GPT, and Gemini to fill forms, scrape data, and navigate websites.

TL;DR
Browser Use lets AI models like Claude and GPT control a browser to automate web tasks.
§01

What it is

Browser Use is an open-source Python library that connects large language models to a real browser. It gives AI agents the ability to navigate websites, fill forms, click buttons, extract data, and perform multi-step web tasks. The library supports Claude, GPT, Gemini, and other LLM providers as the reasoning engine behind the browser actions.

Browser Use targets developers building AI agents that need web interaction capabilities, QA engineers automating browser tests with natural language, and teams building web scraping pipelines that adapt to changing page layouts.

§02

How it saves time or tokens

Traditional browser automation (Selenium, Playwright) requires writing explicit selectors and step-by-step scripts that break when pages change. Browser Use delegates the page understanding to an LLM, which reads the DOM and decides what to click, type, or extract. This makes automations more resilient to layout changes and reduces the maintenance burden of selector-based scripts.

The library handles browser state management, screenshot capture for vision models, and action execution automatically, so you write high-level task descriptions instead of low-level browser commands.

§03

How to use

  1. Install Browser Use: pip install browser-use. Install a browser backend like Playwright: playwright install chromium.
  2. Configure your LLM provider (set API keys for OpenAI, Anthropic, or Google).
  3. Define a task in natural language and run the agent. Browser Use opens a browser, interprets the page, and executes the steps.
§04

Example

from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task='Go to google.com, search for browser-use github, and return the star count',
        llm=ChatOpenAI(model='gpt-4o'),
    )
    result = await agent.run()
    print(result)

The agent opens a browser, navigates to Google, types the search query, clicks the result, reads the star count, and returns it as structured output.

§05

Related on TokRepo

§06

Common pitfalls

  • Each browser action requires an LLM call, so token costs add up for complex multi-step tasks. Use cheaper models for simple navigation and reserve expensive models for complex reasoning steps.
  • Some websites block automated browsers. Use headless mode with caution and respect robots.txt and terms of service.
  • Vision-based page understanding (sending screenshots to the LLM) uses more tokens than DOM-text-based approaches. Choose the interaction mode based on your cost sensitivity.

Frequently Asked Questions

Which LLM providers does Browser Use support?+

Browser Use works with OpenAI (GPT-4o, GPT-4), Anthropic (Claude), Google (Gemini), and any LangChain-compatible model. The LLM is used as the reasoning engine that decides which browser actions to take based on the current page state.

How does Browser Use differ from Playwright or Selenium?+

Playwright and Selenium require explicit CSS/XPath selectors and scripted step sequences. Browser Use uses an LLM to understand the page and decide actions dynamically. This makes it more resilient to page layout changes but costs API tokens per action.

Can Browser Use handle authentication and login flows?+

Yes. Browser Use can navigate to login pages, fill in credentials, and handle multi-step authentication flows. You can provide credentials in the task description or through environment variables. It also supports cookie-based session persistence.

Is Browser Use suitable for production scraping?+

Browser Use works for production use cases where resilience to page changes matters more than raw speed. For high-volume scraping with stable page layouts, traditional Playwright scripts are faster and cheaper since they do not require LLM calls per action.

Does Browser Use work with local LLMs?+

Yes. Any LangChain-compatible model works, including local models served via Ollama or vLLM. However, browser automation tasks require strong reasoning capabilities, so smaller local models may struggle with complex multi-step navigation.

Citations (3)
🙏

Source & Thanks

Created by browser-use. Licensed under MIT. browser-use — ⭐ 84,800+ Docs: docs.browser-use.com

Thanks to the Browser Use team for building the leading open-source AI browser automation library.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets