How do I install Puppeteer — Headless Chrome Automation Library by Google?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Puppeteer — Headless Chrome Automation Library by Google

Introduction

Puppeteer is a Node.js library maintained by the Chrome DevTools team that provides a high-level API to control Chrome or Firefox over the DevTools Protocol. It runs in headless mode by default but can be configured to run in full (headed) mode for debugging or visual testing.

What Puppeteer Does

Automates browser interactions including navigation, form filling, and clicking
Generates screenshots and PDFs of web pages
Crawls single-page applications and generates pre-rendered content
Runs end-to-end tests in a real browser environment
Intercepts and modifies network requests for testing and scraping

Architecture Overview

Puppeteer communicates with a browser instance via the Chrome DevTools Protocol (CDP) or WebDriver BiDi. When you call puppeteer.launch(), it spawns a browser process and establishes a WebSocket connection. Each tab is represented as a Page object, and all interactions are sent as protocol commands. The library ships with a compatible Chromium binary by default, though you can point it at an existing Chrome or Firefox installation.

Self-Hosting & Configuration

Install via npm install puppeteer (downloads Chromium) or npm install puppeteer-core (BYO browser)
Set PUPPETEER_EXECUTABLE_PATH to use a custom browser binary
Configure launch options like --no-sandbox for containerized environments
Use puppeteer.connect() to attach to a remote browser instance via WebSocket
Docker images such as ghcr.io/puppeteer/puppeteer include all system dependencies

Key Features

Full CDP and experimental WebDriver BiDi support
Built-in request interception for mocking API responses
Automatic waiting and smart element selectors
Network throttling and device emulation for mobile testing
First-class TypeScript definitions

Comparison with Similar Tools

Playwright — Multi-browser from day one with built-in test runner; Puppeteer is Chrome/Firefox focused
Selenium — Language-agnostic via WebDriver; Puppeteer is Node.js only but closer to the metal
Cypress — Opinionated test framework with time-travel debugging; Puppeteer is a lower-level library
Crawlee — Built on Puppeteer/Playwright with queue management for large-scale scraping

FAQ

Q: Does Puppeteer work with Firefox? A: Yes. Firefox support via WebDriver BiDi is available as an experimental feature since Puppeteer v21.

Q: Can I run Puppeteer in Docker? A: Yes. Use the official container image or install Chromium system dependencies manually. Pass --no-sandbox when running as root.

Q: How does Puppeteer differ from puppeteer-core? A: The puppeteer package downloads a compatible browser automatically. puppeteer-core skips the download and expects you to provide a browser path.

Q: Is Puppeteer suitable for production scraping? A: It works for moderate workloads. For large-scale crawling, consider Crawlee or a dedicated scraping framework that handles retries and queues.

Puppeteer — Headless Chrome Automation Library by Google

Installation agent prête

Introduction

What Puppeteer Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Fil de discussion

Actifs similaires

Lightpanda — High-Performance Headless Browser for AI and Automation

Selenium — Browser Automation Framework and Ecosystem

Nanobrowser — AI Web Automation Chrome Extension

Lighthouse — Automated Web Performance Auditing by Google