Browser Use — AI Agent Browser Automation
Let AI agents control web browsers with natural language. Browser Use provides vision-based element detection, multi-tab support, and works with any LLM provider.
这个资产会安全暂存
这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件,并在激活脚本、MCP 配置或全局配置前先确认。
npx -y tokrepo@latest install 3d04e209-6c1a-4608-8e43-95b2cd7316d5 --target codex先暂存文件;激活前需要读取暂存 README 和安装计划。
What it is
Browser Use is a Python library that lets AI agents control web browsers using natural language instructions. It provides vision-based element detection (the agent sees the page as a screenshot), multi-tab support, and works with any LLM provider including OpenAI, Anthropic, and local models.
Browser Use targets developers building AI agents that need to interact with web applications: filling forms, navigating dashboards, scraping dynamic content, or automating workflows that lack APIs.
How it saves time or tokens
Browser Use handles the complexity of browser automation (DOM parsing, element location, screenshot capture, action execution) behind a simple Python API. Instead of writing Playwright scripts for every web interaction, the agent describes what to do in natural language and Browser Use translates that into browser actions.
The vision-based approach means the agent works with any website without needing CSS selectors or XPaths.
How to use
- Install Browser Use:
pip install browser-use - Set up your LLM provider API key
- Create an agent with a task description
- Run the agent and watch it navigate the browser
Example
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
agent = Agent(
task='Go to google.com, search for browser automation tools, and extract the top 5 results',
llm=ChatOpenAI(model='gpt-4o'),
)
result = await agent.run()
print(result)
import asyncio
asyncio.run(main())
The agent opens a browser, navigates to Google, types the search query, reads results, and returns structured data.
Related on TokRepo
- Browser automation tools -- AI-powered browser control
- Web scraping tools -- Data extraction from websites
Common pitfalls
- Vision-based detection is slower than DOM-based selectors; expect 2-5 seconds per action
- CAPTCHAs and bot detection can block automated browsing; Browser Use does not bypass these protections
- Token usage is high because screenshots are sent to the LLM on every step; limit the number of steps for cost control
常见问题
Browser Use works with any LLM that supports vision inputs. This includes OpenAI GPT-4o, Anthropic Claude, Google Gemini, and local models via Ollama. The LLM needs vision capability to interpret browser screenshots.
Playwright is a deterministic browser automation library where you write explicit scripts. Browser Use is an AI-driven approach where the agent decides what to do based on what it sees. Use Playwright for predictable, repeatable tasks. Use Browser Use for dynamic tasks where the page layout may change.
Yes. You describe the full workflow in the task string, and the agent executes multiple steps sequentially: navigate, fill forms, click buttons, extract data. The agent maintains context across steps.
It works for scraping dynamic content that requires JavaScript rendering and interaction. For simple static pages, traditional scrapers like BeautifulSoup are faster and cheaper. Browser Use is best for sites that require login, navigation, or interaction.
Each step sends a screenshot to the LLM, consuming image tokens. A typical 10-step workflow with GPT-4o costs approximately $0.10-0.30 depending on screenshot resolution and prompt complexity. Configure lower resolution to reduce costs.
引用来源 (3)
- Browser Use GitHub— Browser Use provides AI agent browser automation with vision-based detection
- Browser Use Docs— Vision-based web interaction for AI agents
- Playwright— Playwright browser automation framework
来源与感谢
browser-use/browser-use — 50k+ stars, MIT
讨论
相关资产
Notte — Browser Automation MCP for AI Agents
MCP server that turns web browsers into AI agent tools. Notte provides structured browser actions like click, type, navigate, and extract for LLM-driven automation.
Browser-Use Web UI — Visual AI Browser Automation
Gradio-based web interface for Browser-Use AI agent. Automate web browsing with visual feedback, persistent sessions, and HD recording. Supports 6+ LLM providers. 15,800+ stars, MIT.
Playwright MCP — Browser Automation Server
Playwright MCP is an MCP server for browser automation via Playwright snapshots. Add via npx in Claude Code/Codex to run deterministic actions.
Browserbase MCP — Cloud Browser Automation Tools
Browserbase MCP server exposes automation tools (navigate, act, observe, extract) backed by Browserbase + Stagehand, letting agents operate real web pages.