Browser Use — AI Browser Automation
Open-source Python library for AI-driven browser automation. Works with Claude, GPT, and Gemini to fill forms, scrape data, and navigate websites.
What it is
Browser Use is an open-source Python library that connects large language models to a real browser. It gives AI agents the ability to navigate websites, fill forms, click buttons, extract data, and perform multi-step web tasks. The library supports Claude, GPT, Gemini, and other LLM providers as the reasoning engine behind the browser actions.
Browser Use targets developers building AI agents that need web interaction capabilities, QA engineers automating browser tests with natural language, and teams building web scraping pipelines that adapt to changing page layouts.
How it saves time or tokens
Traditional browser automation (Selenium, Playwright) requires writing explicit selectors and step-by-step scripts that break when pages change. Browser Use delegates the page understanding to an LLM, which reads the DOM and decides what to click, type, or extract. This makes automations more resilient to layout changes and reduces the maintenance burden of selector-based scripts.
The library handles browser state management, screenshot capture for vision models, and action execution automatically, so you write high-level task descriptions instead of low-level browser commands.
How to use
- Install Browser Use:
pip install browser-use. Install a browser backend like Playwright:playwright install chromium. - Configure your LLM provider (set API keys for OpenAI, Anthropic, or Google).
- Define a task in natural language and run the agent. Browser Use opens a browser, interprets the page, and executes the steps.
Example
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
agent = Agent(
task='Go to google.com, search for browser-use github, and return the star count',
llm=ChatOpenAI(model='gpt-4o'),
)
result = await agent.run()
print(result)
The agent opens a browser, navigates to Google, types the search query, clicks the result, reads the star count, and returns it as structured output.
Related on TokRepo
- AI tools for browser automation — Compare browser automation solutions
- AI tools for web scraping — Data extraction tools for the web
Common pitfalls
- Each browser action requires an LLM call, so token costs add up for complex multi-step tasks. Use cheaper models for simple navigation and reserve expensive models for complex reasoning steps.
- Some websites block automated browsers. Use headless mode with caution and respect robots.txt and terms of service.
- Vision-based page understanding (sending screenshots to the LLM) uses more tokens than DOM-text-based approaches. Choose the interaction mode based on your cost sensitivity.
Frequently Asked Questions
Browser Use works with OpenAI (GPT-4o, GPT-4), Anthropic (Claude), Google (Gemini), and any LangChain-compatible model. The LLM is used as the reasoning engine that decides which browser actions to take based on the current page state.
Playwright and Selenium require explicit CSS/XPath selectors and scripted step sequences. Browser Use uses an LLM to understand the page and decide actions dynamically. This makes it more resilient to page layout changes but costs API tokens per action.
Yes. Browser Use can navigate to login pages, fill in credentials, and handle multi-step authentication flows. You can provide credentials in the task description or through environment variables. It also supports cookie-based session persistence.
Browser Use works for production use cases where resilience to page changes matters more than raw speed. For high-volume scraping with stable page layouts, traditional Playwright scripts are faster and cheaper since they do not require LLM calls per action.
Yes. Any LangChain-compatible model works, including local models served via Ollama or vLLM. However, browser automation tasks require strong reasoning capabilities, so smaller local models may struggle with complex multi-step navigation.
Citations (3)
- Browser Use GitHub— Open-source Python library for AI-driven browser automation
- Browser Use Documentation— Works with Claude, GPT, and Gemini models
- LangChain Documentation— LangChain integration for LLM provider flexibility
Related on TokRepo
Source & Thanks
Created by browser-use. Licensed under MIT. browser-use — ⭐ 84,800+ Docs: docs.browser-use.com
Thanks to the Browser Use team for building the leading open-source AI browser automation library.
Discussion
Related Assets
doctest — The Fastest Feature-Rich C++ Testing Framework
doctest is a single-header C++ testing framework designed for minimal compile-time overhead and maximum speed.
Chai — BDD/TDD Assertion Library for Node.js
Chai is a flexible assertion library for Node.js and browsers that supports expect, should, and assert styles.
Supertest — HTTP Assertion Library for Node.js APIs
Supertest provides a high-level API for testing HTTP servers in Node.js with fluent assertion chaining.