LaVague — Natural Language Web Automation
Give a text objective, LaVague drives the browser to accomplish it. Large Action Model framework for web agents. 6.3K+ stars.
What it is
LaVague is an open-source framework for building AI web agents that interact with websites through natural language instructions. You give it a text objective like 'Find the cheapest flight from NYC to London on this travel site,' and LaVague drives a real browser (via Selenium) to accomplish the task. It uses a World Model to understand the page and an Action Engine to generate and execute browser actions.
It targets developers building browser automation, web testing, or data collection workflows who want to replace brittle CSS selectors and XPath with natural language instructions.
How it saves time or tokens
LaVague replaces traditional browser automation scripts that break when a website changes its layout. Instead of maintaining fragile selectors, you describe what you want in plain English. The Action Engine adapts to page changes automatically. For repetitive web tasks -- filling forms, extracting data, navigating multi-step flows -- LaVague reduces the code you write from dozens of lines to a single objective statement.
How to use
- Install LaVague:
pip install lavague
- Write a web agent:
from lavague.core import WorldModel, ActionEngine
from lavague.core.agents import WebAgent
from lavague.drivers.selenium import SeleniumDriver
driver = SeleniumDriver(headless=False)
agent = WebAgent(
WorldModel(),
ActionEngine(driver),
driver
)
- Run the agent with an objective:
agent.get('https://news.ycombinator.com')
agent.run('Click on the top story and summarize the comments')
Example
from lavague.core import WorldModel, ActionEngine
from lavague.core.agents import WebAgent
from lavague.drivers.selenium import SeleniumDriver
driver = SeleniumDriver(headless=False)
agent = WebAgent(WorldModel(), ActionEngine(driver), driver)
# Navigate to a site and perform a multi-step task
agent.get('https://github.com/trending')
agent.run('Find the top trending Python repository and open its README')
# The agent will:
# 1. Analyze the trending page layout
# 2. Identify the top Python repository
# 3. Click through to the repository page
# 4. Scroll to or click on the README section
Related on TokRepo
- AI tools for browser automation -- Compare browser automation frameworks
- AI tools for web scraping -- Data extraction and web scraping tools
Common pitfalls
- LaVague runs a real browser, which requires Selenium and a Chrome/Chromium installation. Headless mode works for servers, but some sites detect and block headless browsers.
- Complex multi-page flows may require breaking the objective into smaller steps. A single vague objective can lead to incorrect actions on complex websites.
- LLM API costs apply for each page interaction. Pages with many elements generate longer prompts. Set token budgets for cost control in production automation.
Frequently Asked Questions
LaVague uses Selenium WebDriver with Chrome or Chromium by default. The SeleniumDriver handles browser lifecycle, navigation, and action execution. You can run it in headless mode for server environments or with a visible browser for debugging and development.
LaVague uses a World Model that takes a screenshot and the page DOM, then sends them to a vision-capable LLM to understand the page layout and content. The Action Engine then generates specific browser actions (click, type, scroll) based on the World Model's understanding.
Yes. You can pre-authenticate by navigating to the login page and providing credentials through the agent, or by loading cookies from a previous session. LaVague supports standard Selenium cookie management for maintaining authenticated sessions.
LaVague supports OpenAI, Anthropic, and other LLM providers. The World Model typically uses a vision model (like GPT-4o) to interpret screenshots, while the Action Engine can use a text model to generate actions. You configure the provider through environment variables.
LaVague can perform web scraping tasks, but it is optimized for interactive browser automation rather than high-volume data extraction. For scraping hundreds of pages, a traditional scraping library (like Scrapy) is more efficient. LaVague excels at complex interactive flows that are hard to script with selectors.
Citations (3)
- LaVague GitHub Repository— LaVague is a Large Action Model framework for web automation
- LaVague Documentation— LaVague uses a World Model and Action Engine architecture
- OpenAI GPT-4V Documentation— Vision-language models can interpret web page screenshots for automation
Related on TokRepo
Source & Thanks
Created by LaVague AI. Licensed under Apache-2.0.
LaVague — ⭐ 6,300+
Discussion
Related Assets
Conda — Cross-Platform Package and Environment Manager
Install, update, and manage packages and isolated environments for Python, R, C/C++, and hundreds of other languages from a single tool.
Sphinx — Python Documentation Generator
Generate professional documentation from reStructuredText and Markdown with cross-references, API autodoc, and multiple output formats.
Neutralinojs — Lightweight Cross-Platform Desktop Apps
Build desktop applications with HTML, CSS, and JavaScript using a tiny native runtime instead of bundling Chromium.