MCP Configs2026年4月7日·1 分钟阅读

Browser Use — AI Agent Browser Automation

Let AI agents control web browsers with natural language. Browser Use provides vision-based element detection, multi-tab support, and works with any LLM provider.

Browser Use · Community

Agent 就绪

这个资产会安全暂存

这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件，并在激活脚本、MCP 配置或全局配置前先确认。

Stage only · 17/100策略：需暂存

Agent 入口

任意 MCP/CLI Agent

类型

Mcp Config

安装

Stage only

信任

信任等级：Community

入口

Browser Use — AI Agent Browser Automation

安全暂存命令

npx -y tokrepo@latest install 3d04e209-6c1a-4608-8e43-95b2cd7316d5 --target codex

先暂存文件；激活前需要读取暂存 README 和安装计划。

TL;DR

Browser Use gives AI agents vision-based browser control with multi-tab and multi-LLM support.

§01

What it is

Browser Use is a Python library that lets AI agents control web browsers using natural language instructions. It provides vision-based element detection (the agent sees the page as a screenshot), multi-tab support, and works with any LLM provider including OpenAI, Anthropic, and local models.

Browser Use targets developers building AI agents that need to interact with web applications: filling forms, navigating dashboards, scraping dynamic content, or automating workflows that lack APIs.

§02

How it saves time or tokens

Browser Use handles the complexity of browser automation (DOM parsing, element location, screenshot capture, action execution) behind a simple Python API. Instead of writing Playwright scripts for every web interaction, the agent describes what to do in natural language and Browser Use translates that into browser actions.

The vision-based approach means the agent works with any website without needing CSS selectors or XPaths.

§03

How to use

Install Browser Use: pip install browser-use
Set up your LLM provider API key
Create an agent with a task description
Run the agent and watch it navigate the browser

§04

Example

from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task='Go to google.com, search for browser automation tools, and extract the top 5 results',
        llm=ChatOpenAI(model='gpt-4o'),
    )
    result = await agent.run()
    print(result)

import asyncio
asyncio.run(main())

The agent opens a browser, navigates to Google, types the search query, reads results, and returns structured data.

§05

Related on TokRepo

Browser automation tools -- AI-powered browser control
Web scraping tools -- Data extraction from websites

§06

Common pitfalls

Vision-based detection is slower than DOM-based selectors; expect 2-5 seconds per action
CAPTCHAs and bot detection can block automated browsing; Browser Use does not bypass these protections
Token usage is high because screenshots are sent to the LLM on every step; limit the number of steps for cost control

常见问题

Which LLM providers does Browser Use support?+

Browser Use works with any LLM that supports vision inputs. This includes OpenAI GPT-4o, Anthropic Claude, Google Gemini, and local models via Ollama. The LLM needs vision capability to interpret browser screenshots.

How does Browser Use compare to Playwright?+

Playwright is a deterministic browser automation library where you write explicit scripts. Browser Use is an AI-driven approach where the agent decides what to do based on what it sees. Use Playwright for predictable, repeatable tasks. Use Browser Use for dynamic tasks where the page layout may change.

Can Browser Use handle multi-step workflows?+

Yes. You describe the full workflow in the task string, and the agent executes multiple steps sequentially: navigate, fill forms, click buttons, extract data. The agent maintains context across steps.

Is Browser Use suitable for web scraping?+

It works for scraping dynamic content that requires JavaScript rendering and interaction. For simple static pages, traditional scrapers like BeautifulSoup are faster and cheaper. Browser Use is best for sites that require login, navigation, or interaction.

How much does Browser Use cost in API tokens?+

Each step sends a screenshot to the LLM, consuming image tokens. A typical 10-step workflow with GPT-4o costs approximately $0.10-0.30 depending on screenshot resolution and prompt complexity. Configure lower resolution to reduce costs.

引用来源 (3)

Browser Use GitHub— Browser Use provides AI agent browser automation with vision-based detection
Browser Use Docs— Vision-based web interaction for AI agents
Playwright— Playwright browser automation framework

🙏

来源与感谢

browser-use/browser-use — 50k+ stars, MIT

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Browser Use — AI Agent Browser Automation

这个资产会安全暂存

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Notte — Browser Automation MCP for AI Agents

Browser-Use Web UI — Visual AI Browser Automation

Playwright MCP — Browser Automation Server

Browserbase MCP — Cloud Browser Automation Tools