MCP Configs2026年4月7日·1 分钟阅读

Browser Use — AI Agent Browser Automation

Let AI agents control web browsers with natural language. Browser Use provides vision-based element detection, multi-tab support, and works with any LLM provider.

Agent 就绪

这个资产会安全暂存

这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件,并在激活脚本、MCP 配置或全局配置前先确认。

Stage only · 17/100策略:需暂存
Agent 入口
任意 MCP/CLI Agent
类型
Mcp Config
安装
Stage only
信任
信任等级:Community
入口
Browser Use — AI Agent Browser Automation
安全暂存命令
npx -y tokrepo@latest install 3d04e209-6c1a-4608-8e43-95b2cd7316d5 --target codex

先暂存文件;激活前需要读取暂存 README 和安装计划。

TL;DR
Browser Use gives AI agents vision-based browser control with multi-tab and multi-LLM support.
§01

What it is

Browser Use is a Python library that lets AI agents control web browsers using natural language instructions. It provides vision-based element detection (the agent sees the page as a screenshot), multi-tab support, and works with any LLM provider including OpenAI, Anthropic, and local models.

Browser Use targets developers building AI agents that need to interact with web applications: filling forms, navigating dashboards, scraping dynamic content, or automating workflows that lack APIs.

§02

How it saves time or tokens

Browser Use handles the complexity of browser automation (DOM parsing, element location, screenshot capture, action execution) behind a simple Python API. Instead of writing Playwright scripts for every web interaction, the agent describes what to do in natural language and Browser Use translates that into browser actions.

The vision-based approach means the agent works with any website without needing CSS selectors or XPaths.

§03

How to use

  1. Install Browser Use: pip install browser-use
  2. Set up your LLM provider API key
  3. Create an agent with a task description
  4. Run the agent and watch it navigate the browser
§04

Example

from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task='Go to google.com, search for browser automation tools, and extract the top 5 results',
        llm=ChatOpenAI(model='gpt-4o'),
    )
    result = await agent.run()
    print(result)

import asyncio
asyncio.run(main())

The agent opens a browser, navigates to Google, types the search query, reads results, and returns structured data.

§05

Related on TokRepo

§06

Common pitfalls

  • Vision-based detection is slower than DOM-based selectors; expect 2-5 seconds per action
  • CAPTCHAs and bot detection can block automated browsing; Browser Use does not bypass these protections
  • Token usage is high because screenshots are sent to the LLM on every step; limit the number of steps for cost control

常见问题

Which LLM providers does Browser Use support?+

Browser Use works with any LLM that supports vision inputs. This includes OpenAI GPT-4o, Anthropic Claude, Google Gemini, and local models via Ollama. The LLM needs vision capability to interpret browser screenshots.

How does Browser Use compare to Playwright?+

Playwright is a deterministic browser automation library where you write explicit scripts. Browser Use is an AI-driven approach where the agent decides what to do based on what it sees. Use Playwright for predictable, repeatable tasks. Use Browser Use for dynamic tasks where the page layout may change.

Can Browser Use handle multi-step workflows?+

Yes. You describe the full workflow in the task string, and the agent executes multiple steps sequentially: navigate, fill forms, click buttons, extract data. The agent maintains context across steps.

Is Browser Use suitable for web scraping?+

It works for scraping dynamic content that requires JavaScript rendering and interaction. For simple static pages, traditional scrapers like BeautifulSoup are faster and cheaper. Browser Use is best for sites that require login, navigation, or interaction.

How much does Browser Use cost in API tokens?+

Each step sends a screenshot to the LLM, consuming image tokens. A typical 10-step workflow with GPT-4o costs approximately $0.10-0.30 depending on screenshot resolution and prompt complexity. Configure lower resolution to reduce costs.

引用来源 (3)
🙏

来源与感谢

browser-use/browser-use — 50k+ stars, MIT

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产