# Goquery — jQuery-Style HTML Parsing for Go > A Go package that brings jQuery-like syntax for traversing and manipulating HTML documents. Built on top of the net/html tokenizer and cascadia CSS selector library. ## Install Save in your project root: # Goquery — jQuery-Style HTML Parsing for Go ## Quick Use ```bash go get github.com/PuerkitoBio/goquery ``` ```go doc, err := goquery.NewDocumentFromReader(resp.Body) doc.Find("h2.title").Each(func(i int, s *goquery.Selection) { fmt.Println(s.Text()) }) ``` ## Introduction Goquery provides a jQuery-like API for parsing and querying HTML in Go. It combines Go's net/html parser with the cascadia CSS selector engine, giving developers a familiar, chainable interface for extracting data from web pages without writing manual tree-walking code. ## What Goquery Does - Parses HTML documents into a traversable node tree - Supports full CSS3 selector queries via the cascadia library - Provides chainable methods like Find, Filter, Children, Parents, and Siblings - Extracts text content, attribute values, and HTML fragments - Enables DOM manipulation including Add, Remove, and ReplaceWith ## Architecture Overview Goquery wraps Go's html.Node tree in a Selection type that holds a slice of matched nodes plus a pointer to the root Document. Methods on Selection return new Selection values, enabling jQuery-style chaining. CSS selectors are compiled once by cascadia and cached, keeping repeated queries fast. ## Setup & Configuration - Requires Go 1.18 or later - Install with `go get github.com/PuerkitoBio/goquery` - Create documents from an io.Reader, a string, or an http.Response - Pair with Go's net/http client for scraping workflows - Combine with colly or similar crawlers for large-scale extraction ## Key Features - Full CSS3 selector support including pseudo-classes and attribute selectors - Positional methods: First, Last, Eq, Slice for index-based access - Traversal methods mirror jQuery: Next, Prev, Parent, Closest - Attribute helpers: Attr, AttrOr, HasClass, AddClass, RemoveClass - Zero external C dependencies, pure Go implementation ## Comparison with Similar Tools - **Colly** — full scraping framework with request scheduling; goquery handles just the parsing layer - **htmlquery** — uses XPath instead of CSS selectors for node selection - **x/net/html** — raw tokenizer with no selector engine; goquery builds on top of it - **Cascadia** — the CSS selector engine goquery uses internally; lower-level API ## FAQ **Q: Can goquery execute JavaScript?** A: No. Goquery parses static HTML only. For JavaScript-rendered pages, use a headless browser like chromedp and pass the rendered HTML to goquery. **Q: Is goquery safe for concurrent use?** A: A Document is safe to read concurrently. Mutations require external synchronization. **Q: How does goquery handle malformed HTML?** A: It relies on Go's net/html parser, which implements the HTML5 parsing algorithm and handles malformed markup gracefully. **Q: Can I modify the DOM and serialize back to HTML?** A: Yes. Use manipulation methods to change the tree, then call goquery.Render or Selection.Html to get the modified HTML string. ## Sources - https://github.com/PuerkitoBio/goquery - https://pkg.go.dev/github.com/PuerkitoBio/goquery --- Source: https://tokrepo.com/en/workflows/asset-9b5f4ad9 Author: AI Open Source