Scripts2026年4月2日·1 分钟阅读

Browser Use — AI Agent Browser Automation

Make any website accessible to AI agents. Automate browser tasks with LLMs — click, type, navigate, extract data. 70K+ stars, MIT licensed.

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

```bash pip install browser-use playwright install chromium ``` ```python from browser_use import Agent from langchain_openai import ChatOpenAI agent = Agent( task="Go to reddit.com/r/python, find the top post today, and summarize it", llm=ChatOpenAI(model="gpt-4o"), ) result = await agent.run() print(result) ``` Set your API key: `export OPENAI_API_KEY=sk-...` (or use Anthropic, Gemini, etc.)
## Introduction Browser Use is an **open-source library that makes websites accessible to AI agents**. It bridges the gap between LLMs and real web browsers, enabling agents to autonomously navigate pages, fill forms, click buttons, extract data, and complete multi-step workflows. Core capabilities: - **Vision + HTML Extraction** — Combines visual understanding with DOM analysis for robust element detection, even on complex dynamic pages - **Multi-Tab Management** — Agents can open, switch between, and manage multiple browser tabs simultaneously - **Automatic Error Recovery** — Self-correcting agents that handle popups, CAPTCHAs, and unexpected page states - **Parallel Agents** — Run multiple browser agents concurrently for batch processing tasks - **Custom Actions** — Define reusable browser actions (save to file, send notification, call API) that agents can invoke - **LLM Agnostic** — Works with OpenAI, Anthropic Claude, Google Gemini, DeepSeek, and any LangChain-compatible model - **Session Persistence** — Connect to existing Chrome sessions with cookies and login state preserved 70,000+ GitHub stars. Used for web scraping, form automation, testing, data collection, and building autonomous web agents. ## FAQ **Q: How does Browser Use differ from Playwright or Selenium?** A: Playwright/Selenium require you to write explicit selectors and step-by-step scripts. Browser Use lets you describe tasks in natural language, and the AI agent figures out how to interact with the page autonomously. **Q: Does it work with sites that require login?** A: Yes. You can either let the agent log in with credentials, or connect to an existing Chrome session where you're already authenticated. **Q: Can I run it headless (no visible browser)?** A: Yes. Pass `headless=True` to the Browser config. This is useful for server-side automation and CI/CD pipelines. **Q: How much does it cost to run?** A: Cost depends on the LLM provider. Each page interaction typically uses 1-3K tokens. A typical 10-step task costs ~$0.05-0.15 with GPT-4o. ## Works With - OpenAI / Anthropic / Google / DeepSeek / any LangChain LLM - Playwright (Chromium) for browser control - Python 3.11+ async/await - Docker for containerized deployment
🙏

来源与感谢

- GitHub: [browser-use/browser-use](https://github.com/browser-use/browser-use) - License: MIT - Stars: 70,000+ - Maintainer: Browser Use team Thanks to the Browser Use team for creating the most popular open-source browser automation framework for AI agents, making the web programmable through natural language.

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产