Browser Use — AI Browser Automation
Open-source Python library for AI-driven browser automation. Works with Claude, GPT, and Gemini to fill forms, scrape data, and navigate websites.
先审查再安装
这个资产需要先审查。复制的指令会要求 Agent dry-run、列出写入项,确认后再继续。
npx -y tokrepo@latest install 52993269-0cbd-49e2-bb6a-54f429a5feab --target codex先 dry-run,确认写入项后再运行此命令。
What it is
Browser Use is an open-source Python library that connects large language models to a real browser. It gives AI agents the ability to navigate websites, fill forms, click buttons, extract data, and perform multi-step web tasks. The library supports Claude, GPT, Gemini, and other LLM providers as the reasoning engine behind the browser actions.
Browser Use targets developers building AI agents that need web interaction capabilities, QA engineers automating browser tests with natural language, and teams building web scraping pipelines that adapt to changing page layouts.
How it saves time or tokens
Traditional browser automation (Selenium, Playwright) requires writing explicit selectors and step-by-step scripts that break when pages change. Browser Use delegates the page understanding to an LLM, which reads the DOM and decides what to click, type, or extract. This makes automations more resilient to layout changes and reduces the maintenance burden of selector-based scripts.
The library handles browser state management, screenshot capture for vision models, and action execution automatically, so you write high-level task descriptions instead of low-level browser commands.
How to use
- Install Browser Use:
pip install browser-use. Install a browser backend like Playwright:playwright install chromium. - Configure your LLM provider (set API keys for OpenAI, Anthropic, or Google).
- Define a task in natural language and run the agent. Browser Use opens a browser, interprets the page, and executes the steps.
Example
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
agent = Agent(
task='Go to google.com, search for browser-use github, and return the star count',
llm=ChatOpenAI(model='gpt-4o'),
)
result = await agent.run()
print(result)
The agent opens a browser, navigates to Google, types the search query, clicks the result, reads the star count, and returns it as structured output.
Related on TokRepo
- AI tools for browser automation — Compare browser automation solutions
- AI tools for web scraping — Data extraction tools for the web
Common pitfalls
- Each browser action requires an LLM call, so token costs add up for complex multi-step tasks. Use cheaper models for simple navigation and reserve expensive models for complex reasoning steps.
- Some websites block automated browsers. Use headless mode with caution and respect robots.txt and terms of service.
- Vision-based page understanding (sending screenshots to the LLM) uses more tokens than DOM-text-based approaches. Choose the interaction mode based on your cost sensitivity.
常见问题
Browser Use works with OpenAI (GPT-4o, GPT-4), Anthropic (Claude), Google (Gemini), and any LangChain-compatible model. The LLM is used as the reasoning engine that decides which browser actions to take based on the current page state.
Playwright and Selenium require explicit CSS/XPath selectors and scripted step sequences. Browser Use uses an LLM to understand the page and decide actions dynamically. This makes it more resilient to page layout changes but costs API tokens per action.
Yes. Browser Use can navigate to login pages, fill in credentials, and handle multi-step authentication flows. You can provide credentials in the task description or through environment variables. It also supports cookie-based session persistence.
Browser Use works for production use cases where resilience to page changes matters more than raw speed. For high-volume scraping with stable page layouts, traditional Playwright scripts are faster and cheaper since they do not require LLM calls per action.
Yes. Any LangChain-compatible model works, including local models served via Ollama or vLLM. However, browser automation tasks require strong reasoning capabilities, so smaller local models may struggle with complex multi-step navigation.
引用来源 (3)
- Browser Use GitHub— Open-source Python library for AI-driven browser automation
- Browser Use Documentation— Works with Claude, GPT, and Gemini models
- LangChain Documentation— LangChain integration for LLM provider flexibility
来源与感谢
Created by browser-use. Licensed under MIT. browser-use — ⭐ 84,800+ Docs: docs.browser-use.com
Thanks to the Browser Use team for building the leading open-source AI browser automation library.
讨论
相关资产
Browser-Use Web UI — Visual AI Browser Automation
Gradio-based web interface for Browser-Use AI agent. Automate web browsing with visual feedback, persistent sessions, and HD recording. Supports 6+ LLM providers. 15,800+ stars, MIT.
Docker Selenium Grid — Containerized Browser Testing at Scale
Docker Selenium provides pre-built container images to run Selenium Grid with Chrome, Firefox, and Edge, enabling scalable browser automation in CI/CD pipelines.
WebLLM — High-Performance In-Browser LLM Inference
A JavaScript library that runs large language models directly in the browser using WebGPU, enabling private on-device AI without a server.
Animate.css — Cross-Browser CSS Animation Library
A collection of ready-to-use, cross-browser CSS animations for use in your web projects. Drop-in class names for entrances, exits, attention seekers, and more.