Scripts2026年4月24日·1 分钟阅读

Cheerio — Fast HTML Parsing with jQuery Syntax for Node.js

A fast, flexible implementation of jQuery core for server-side HTML parsing, traversal, and manipulation in Node.js.

assetLangBanner.body

Introduction

Cheerio provides a fast, lean implementation of jQuery's core API for the server. It parses HTML and XML documents into a traversable DOM-like structure, letting you select elements with CSS selectors, read attributes, and manipulate the markup without running a browser or headless engine.

What Cheerio Does

  • Parses HTML and XML strings into a traversable tree structure
  • Selects elements using CSS selectors compatible with jQuery syntax
  • Reads and modifies attributes, text content, and inner HTML
  • Traverses the DOM with parent, children, siblings, find, and filter
  • Serializes the modified tree back to an HTML string

Architecture Overview

Cheerio uses htmlparser2 (or parse5 for spec-compliant parsing) to build an in-memory DOM tree from raw HTML. The jQuery-style API wraps this tree with selector-based querying powered by css-select and DOM manipulation methods. Since there is no browser context, no CSS rendering or JavaScript execution occurs, making it lightweight and predictable for scraping and template transformation tasks.

Installation & Configuration

  • Install via npm; works in Node.js 18+ and modern edge runtimes
  • Call cheerio.load() with an HTML string to create a root query function
  • Pass options to switch between htmlparser2 (fast, lenient) and parse5 (spec-compliant) parsers
  • Configure XML mode for parsing XML documents with self-closing tags
  • Pair with fetch or axios to download pages before parsing

Key Features

  • Familiar jQuery API reduces learning curve for front-end developers
  • Fast parsing without the overhead of a full browser engine
  • Works with malformed HTML thanks to htmlparser2's lenient parsing
  • Supports both HTML and XML document processing
  • Lightweight with no native dependencies or browser requirement

Comparison with Similar Tools

  • jsdom — full W3C DOM with script execution but heavier; Cheerio is faster when you only need parsing and selection
  • Puppeteer — controls a real Chromium browser for JS-rendered pages; Cheerio works on static HTML only
  • htmlparser2 — lower-level streaming parser; Cheerio adds the jQuery traversal and manipulation layer
  • BeautifulSoup (Python) — similar concept for Python; Cheerio serves the Node.js ecosystem
  • LinkedOM — faster DOM alternative; Cheerio offers a more familiar jQuery-style API

FAQ

Q: Can Cheerio execute JavaScript in the page? A: No. Cheerio only parses and manipulates static HTML. For JS-rendered pages use Puppeteer or Playwright.

Q: Is Cheerio suitable for web scraping? A: Yes. Fetch the HTML with an HTTP client and pass it to cheerio.load(). Select data with CSS selectors and extract text or attributes.

Q: Does Cheerio support streaming HTML parsing? A: Cheerio v1+ supports loading from a stream via the cheerio.fromURL() helper or by piping into the parser.

Q: How does performance compare to jsdom? A: Cheerio is typically several times faster than jsdom for parsing and querying because it skips browser emulation.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产