Introduction
Goquery provides a jQuery-like API for parsing and querying HTML in Go. It combines Go's net/html parser with the cascadia CSS selector engine, giving developers a familiar, chainable interface for extracting data from web pages without writing manual tree-walking code.
What Goquery Does
- Parses HTML documents into a traversable node tree
- Supports full CSS3 selector queries via the cascadia library
- Provides chainable methods like Find, Filter, Children, Parents, and Siblings
- Extracts text content, attribute values, and HTML fragments
- Enables DOM manipulation including Add, Remove, and ReplaceWith
Architecture Overview
Goquery wraps Go's html.Node tree in a Selection type that holds a slice of matched nodes plus a pointer to the root Document. Methods on Selection return new Selection values, enabling jQuery-style chaining. CSS selectors are compiled once by cascadia and cached, keeping repeated queries fast.
Setup & Configuration
- Requires Go 1.18 or later
- Install with
go get github.com/PuerkitoBio/goquery - Create documents from an io.Reader, a string, or an http.Response
- Pair with Go's net/http client for scraping workflows
- Combine with colly or similar crawlers for large-scale extraction
Key Features
- Full CSS3 selector support including pseudo-classes and attribute selectors
- Positional methods: First, Last, Eq, Slice for index-based access
- Traversal methods mirror jQuery: Next, Prev, Parent, Closest
- Attribute helpers: Attr, AttrOr, HasClass, AddClass, RemoveClass
- Zero external C dependencies, pure Go implementation
Comparison with Similar Tools
- Colly — full scraping framework with request scheduling; goquery handles just the parsing layer
- htmlquery — uses XPath instead of CSS selectors for node selection
- x/net/html — raw tokenizer with no selector engine; goquery builds on top of it
- Cascadia — the CSS selector engine goquery uses internally; lower-level API
FAQ
Q: Can goquery execute JavaScript? A: No. Goquery parses static HTML only. For JavaScript-rendered pages, use a headless browser like chromedp and pass the rendered HTML to goquery.
Q: Is goquery safe for concurrent use? A: A Document is safe to read concurrently. Mutations require external synchronization.
Q: How does goquery handle malformed HTML? A: It relies on Go's net/html parser, which implements the HTML5 parsing algorithm and handles malformed markup gracefully.
Q: Can I modify the DOM and serialize back to HTML? A: Yes. Use manipulation methods to change the tree, then call goquery.Render or Selection.Html to get the modified HTML string.