Introduction
Scrapling is a Python web scraping framework designed to adapt to website changes automatically. It provides resilient element selection, built-in stealth capabilities, and a unified API that covers everything from static pages to JavaScript-heavy SPAs.
What Scrapling Does
- Provides adaptive CSS and XPath selectors that survive website redesigns
- Handles JavaScript rendering via Playwright integration
- Bypasses common anti-bot protections with stealth mode
- Offers a unified API for static and dynamic page scraping
- Supports automatic retry, rate limiting, and request fingerprinting
Architecture Overview
Scrapling uses a layered approach: a Fetcher layer handles HTTP requests with optional Playwright backing, a Parser layer converts responses into navigable trees, and an Adaptor layer applies smart selectors that learn element positions across page versions. Stealth features operate at the browser fingerprint level.
Self-Hosting & Configuration
- Install via pip:
pip install scraplingor with Playwright extras - No external services required; runs entirely on the local machine
- Configure request headers, proxies, and rate limits per Fetcher instance
- Enable stealth mode by switching to the StealthFetcher class
- Supports async operation for high-throughput crawl pipelines
Key Features
- Smart selectors that auto-adapt when page structure changes
- Three fetcher types: static, Playwright-based, and stealth
- Built-in response caching and deduplication
- Lightweight with minimal dependencies for the static fetcher
- MCP server integration for use with AI agents
Comparison with Similar Tools
- Scrapy — full crawl framework with more boilerplate; Scrapling is simpler for targeted extraction
- BeautifulSoup — parsing only, no fetching or anti-detection
- Playwright — browser automation without scraping-specific helpers
- Crawlee — Node.js focused; Scrapling is Python-native
- Selenium — heavier, older API with no adaptive selectors
FAQ
Q: Does Scrapling require a headless browser? A: Only if you use PlaywrightFetcher or StealthFetcher. The default Fetcher uses plain HTTP requests.
Q: Can it handle login-protected pages? A: Yes. Pass cookies or use Playwright's persistent context to maintain sessions.
Q: How does the adaptive selector work? A: It stores element signatures and uses fuzzy matching to relocate elements even after class names or DOM hierarchy changes.
Q: Is Scrapling production-ready for large crawls? A: Yes. It supports async fetching, proxy rotation, and rate limiting out of the box.