Crawlee — Web Scraping and Browser Automation Library
Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.
Ready-to-run agent install
This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.
npx -y tokrepo@latest install 8f2c0ae9-1327-481f-a519-d473751bdd76 --target codexRun after dry-run confirms the install plan.
What it is
Crawlee is a web scraping and browser automation library for Node.js and Python. It handles the hard parts of web scraping: proxy rotation, browser fingerprints, automatic retries, request queuing, auto-scaling, and anti-bot bypassing. Crawlee supports HTTP crawling (Cheerio/BeautifulSoup), headless browsers (Playwright/Puppeteer), and adaptive switching between modes.
Crawlee targets developers building production web scrapers who need reliability and scale. Instead of writing retry logic, proxy management, and fingerprint rotation from scratch, Crawlee provides these as built-in features.
How it saves time or tokens
Building a reliable web scraper means handling rate limiting, CAPTCHAs, IP blocks, JavaScript rendering, and data extraction. Crawlee bundles all of these concerns into a single library. The auto-scaling feature adjusts concurrency based on system resources and target server response times. Proxy rotation and browser fingerprint management reduce blocks without custom code.
How to use
- Create a new scraper:
npx crawlee create my-scraper
cd my-scraper
npm start
- Or install manually:
npm install crawlee playwright
- Python version:
pip install crawlee[playwright]
Example
import { PlaywrightCrawler } from 'crawlee';
const crawler = new PlaywrightCrawler({
maxRequestsPerCrawl: 100,
async requestHandler({ page, request, enqueueLinks }) {
const title = await page.title();
const price = await page.$eval('.price', el => el.textContent);
console.log(`${request.url}: ${title} - ${price}`);
// Follow pagination links
await enqueueLinks({
selector: '.pagination a',
strategy: 'same-domain',
});
},
});
await crawler.run(['https://example.com/products']);
# Python version
from crawlee.playwright_crawler import PlaywrightCrawler
crawler = PlaywrightCrawler(max_requests_per_crawl=100)
@crawler.router.default_handler
async def handler(context):
title = await context.page.title()
context.log.info(f'{context.request.url}: {title}')
await context.enqueue_links(strategy='same-domain')
await crawler.run(['https://example.com'])
Related on TokRepo
- Web Scraping Tools — Scraping and data extraction tools
- Browser Automation — Automate browser interactions
This tool integrates with standard development workflows and requires minimal configuration to get started. It is available as open-source software with documentation and community support through the official repository. The project follows semantic versioning for stable releases.
For teams evaluating this tool, the key advantage is reducing manual work in repetitive tasks. The automation provided by the built-in features means less custom code to maintain and fewer integration points to manage. This translates directly to lower maintenance costs and faster iteration cycles.
Common pitfalls
- PlaywrightCrawler launches real browsers which consume significant memory; use CheerioCrawler for pages that do not require JavaScript rendering.
- Proxy rotation requires proxy URLs configured in the crawler options; Crawlee does not provide proxies, only the rotation logic.
- Respect robots.txt and website terms of service; Crawlee provides the technical capability but compliance is your responsibility.
Frequently Asked Questions
CheerioCrawler makes HTTP requests and parses HTML with Cheerio (no browser). PlaywrightCrawler launches a headless browser for pages that require JavaScript rendering. Use CheerioCrawler when possible for better performance and lower resource usage.
Yes. Crawlee includes browser fingerprint rotation, request header randomization, and session management to reduce detection. For advanced anti-bot systems, combine with proxy rotation and human-like browsing patterns.
Yes. Crawlee has official Python support with the same features as the Node.js version. Install with pip install crawlee[playwright] for browser-based scraping or crawlee[beautifulsoup] for HTTP scraping.
Yes. Crawlee includes auto-scaling that adjusts concurrency based on system resources and server response times. The request queue handles millions of URLs with automatic deduplication and retry logic.
Yes. Crawlee is open-source under the Apache 2.0 license. It is developed by Apify but can be used independently without an Apify account. Apify offers a managed platform for running Crawlee scrapers in the cloud.
Citations (3)
- Crawlee GitHub— Crawlee handles proxy rotation, fingerprints, and anti-bot bypassing
- Crawlee Documentation— Crawlee supports Node.js and Python with Playwright and Cheerio/BeautifulSoup
- Crawlee Official Site— Crawlee is open-source under Apache 2.0 by Apify
Related on TokRepo
Source & Thanks
- GitHub: apify/crawlee (16k+ stars)
- Docs: crawlee.dev
Discussion
Related Assets
Obscura — Headless Browser Built for AI Agents and Web Scraping
A high-performance headless browser written in Rust, designed specifically for AI agent workflows and large-scale web scraping with built-in stealth and anti-detection capabilities.
Lightpanda — High-Performance Headless Browser for AI and Automation
Lightpanda is a headless browser built in Zig that is designed for web scraping, AI agent browsing, and automation workloads. It focuses on speed and low resource usage, executing JavaScript and rendering DOM without a visual interface.
Scrapling — Adaptive Web Scraping Framework for Python
An intelligent Python web scraping framework that handles single requests to full-scale crawls with built-in anti-detection and auto-adaptation.
ScrapeGraphAI — AI-Powered Web Scraping
Python scraping library powered by LLMs. Describe what you want to extract in natural language, get structured data back. Handles dynamic pages. 23K+ stars.