ScriptsApr 24, 2026·3 min read

Cheerio — Fast HTML Parsing with jQuery Syntax for Node.js

A fast, flexible implementation of jQuery core for server-side HTML parsing, traversal, and manipulation in Node.js.

Introduction

Cheerio provides a fast, lean implementation of jQuery's core API for the server. It parses HTML and XML documents into a traversable DOM-like structure, letting you select elements with CSS selectors, read attributes, and manipulate the markup without running a browser or headless engine.

What Cheerio Does

  • Parses HTML and XML strings into a traversable tree structure
  • Selects elements using CSS selectors compatible with jQuery syntax
  • Reads and modifies attributes, text content, and inner HTML
  • Traverses the DOM with parent, children, siblings, find, and filter
  • Serializes the modified tree back to an HTML string

Architecture Overview

Cheerio uses htmlparser2 (or parse5 for spec-compliant parsing) to build an in-memory DOM tree from raw HTML. The jQuery-style API wraps this tree with selector-based querying powered by css-select and DOM manipulation methods. Since there is no browser context, no CSS rendering or JavaScript execution occurs, making it lightweight and predictable for scraping and template transformation tasks.

Installation & Configuration

  • Install via npm; works in Node.js 18+ and modern edge runtimes
  • Call cheerio.load() with an HTML string to create a root query function
  • Pass options to switch between htmlparser2 (fast, lenient) and parse5 (spec-compliant) parsers
  • Configure XML mode for parsing XML documents with self-closing tags
  • Pair with fetch or axios to download pages before parsing

Key Features

  • Familiar jQuery API reduces learning curve for front-end developers
  • Fast parsing without the overhead of a full browser engine
  • Works with malformed HTML thanks to htmlparser2's lenient parsing
  • Supports both HTML and XML document processing
  • Lightweight with no native dependencies or browser requirement

Comparison with Similar Tools

  • jsdom — full W3C DOM with script execution but heavier; Cheerio is faster when you only need parsing and selection
  • Puppeteer — controls a real Chromium browser for JS-rendered pages; Cheerio works on static HTML only
  • htmlparser2 — lower-level streaming parser; Cheerio adds the jQuery traversal and manipulation layer
  • BeautifulSoup (Python) — similar concept for Python; Cheerio serves the Node.js ecosystem
  • LinkedOM — faster DOM alternative; Cheerio offers a more familiar jQuery-style API

FAQ

Q: Can Cheerio execute JavaScript in the page? A: No. Cheerio only parses and manipulates static HTML. For JS-rendered pages use Puppeteer or Playwright.

Q: Is Cheerio suitable for web scraping? A: Yes. Fetch the HTML with an HTTP client and pass it to cheerio.load(). Select data with CSS selectors and extract text or attributes.

Q: Does Cheerio support streaming HTML parsing? A: Cheerio v1+ supports loading from a stream via the cheerio.fromURL() helper or by piping into the parser.

Q: How does performance compare to jsdom? A: Cheerio is typically several times faster than jsdom for parsing and querying because it skips browser emulation.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets