Introduction
Cheerio provides a fast, lean implementation of jQuery's core API for the server. It parses HTML and XML documents into a traversable DOM-like structure, letting you select elements with CSS selectors, read attributes, and manipulate the markup without running a browser or headless engine.
What Cheerio Does
- Parses HTML and XML strings into a traversable tree structure
- Selects elements using CSS selectors compatible with jQuery syntax
- Reads and modifies attributes, text content, and inner HTML
- Traverses the DOM with parent, children, siblings, find, and filter
- Serializes the modified tree back to an HTML string
Architecture Overview
Cheerio uses htmlparser2 (or parse5 for spec-compliant parsing) to build an in-memory DOM tree from raw HTML. The jQuery-style API wraps this tree with selector-based querying powered by css-select and DOM manipulation methods. Since there is no browser context, no CSS rendering or JavaScript execution occurs, making it lightweight and predictable for scraping and template transformation tasks.
Installation & Configuration
- Install via npm; works in Node.js 18+ and modern edge runtimes
- Call cheerio.load() with an HTML string to create a root query function
- Pass options to switch between htmlparser2 (fast, lenient) and parse5 (spec-compliant) parsers
- Configure XML mode for parsing XML documents with self-closing tags
- Pair with fetch or axios to download pages before parsing
Key Features
- Familiar jQuery API reduces learning curve for front-end developers
- Fast parsing without the overhead of a full browser engine
- Works with malformed HTML thanks to htmlparser2's lenient parsing
- Supports both HTML and XML document processing
- Lightweight with no native dependencies or browser requirement
Comparison with Similar Tools
- jsdom — full W3C DOM with script execution but heavier; Cheerio is faster when you only need parsing and selection
- Puppeteer — controls a real Chromium browser for JS-rendered pages; Cheerio works on static HTML only
- htmlparser2 — lower-level streaming parser; Cheerio adds the jQuery traversal and manipulation layer
- BeautifulSoup (Python) — similar concept for Python; Cheerio serves the Node.js ecosystem
- LinkedOM — faster DOM alternative; Cheerio offers a more familiar jQuery-style API
FAQ
Q: Can Cheerio execute JavaScript in the page? A: No. Cheerio only parses and manipulates static HTML. For JS-rendered pages use Puppeteer or Playwright.
Q: Is Cheerio suitable for web scraping? A: Yes. Fetch the HTML with an HTTP client and pass it to cheerio.load(). Select data with CSS selectors and extract text or attributes.
Q: Does Cheerio support streaming HTML parsing? A: Cheerio v1+ supports loading from a stream via the cheerio.fromURL() helper or by piping into the parser.
Q: How does performance compare to jsdom? A: Cheerio is typically several times faster than jsdom for parsing and querying because it skips browser emulation.