Introduction
Puppeteer is a Node.js library maintained by the Chrome DevTools team that provides a high-level API to control Chrome or Firefox over the DevTools Protocol. It runs in headless mode by default but can be configured to run in full (headed) mode for debugging or visual testing.
What Puppeteer Does
- Automates browser interactions including navigation, form filling, and clicking
- Generates screenshots and PDFs of web pages
- Crawls single-page applications and generates pre-rendered content
- Runs end-to-end tests in a real browser environment
- Intercepts and modifies network requests for testing and scraping
Architecture Overview
Puppeteer communicates with a browser instance via the Chrome DevTools Protocol (CDP) or WebDriver BiDi. When you call puppeteer.launch(), it spawns a browser process and establishes a WebSocket connection. Each tab is represented as a Page object, and all interactions are sent as protocol commands. The library ships with a compatible Chromium binary by default, though you can point it at an existing Chrome or Firefox installation.
Self-Hosting & Configuration
- Install via
npm install puppeteer(downloads Chromium) ornpm install puppeteer-core(BYO browser) - Set
PUPPETEER_EXECUTABLE_PATHto use a custom browser binary - Configure launch options like
--no-sandboxfor containerized environments - Use
puppeteer.connect()to attach to a remote browser instance via WebSocket - Docker images such as
ghcr.io/puppeteer/puppeteerinclude all system dependencies
Key Features
- Full CDP and experimental WebDriver BiDi support
- Built-in request interception for mocking API responses
- Automatic waiting and smart element selectors
- Network throttling and device emulation for mobile testing
- First-class TypeScript definitions
Comparison with Similar Tools
- Playwright — Multi-browser from day one with built-in test runner; Puppeteer is Chrome/Firefox focused
- Selenium — Language-agnostic via WebDriver; Puppeteer is Node.js only but closer to the metal
- Cypress — Opinionated test framework with time-travel debugging; Puppeteer is a lower-level library
- Crawlee — Built on Puppeteer/Playwright with queue management for large-scale scraping
FAQ
Q: Does Puppeteer work with Firefox? A: Yes. Firefox support via WebDriver BiDi is available as an experimental feature since Puppeteer v21.
Q: Can I run Puppeteer in Docker?
A: Yes. Use the official container image or install Chromium system dependencies manually. Pass --no-sandbox when running as root.
Q: How does Puppeteer differ from puppeteer-core?
A: The puppeteer package downloads a compatible browser automatically. puppeteer-core skips the download and expects you to provide a browser path.
Q: Is Puppeteer suitable for production scraping? A: It works for moderate workloads. For large-scale crawling, consider Crawlee or a dedicated scraping framework that handles retries and queues.