What is Crawlee?
Crawlee is a web scraping and browser automation library that handles the hard parts — proxy rotation, browser fingerprints, retries, auto-scaling, and storage — so you can focus on the extraction logic. Available for Node.js and Python.
Answer-Ready: Crawlee is a web scraping library for Node.js and Python that handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing for reliable data extraction.
Core Features
1. Multiple Crawler Types
// HTTP crawler (fastest, for simple pages)
import { CheerioCrawler } from 'crawlee';
const crawler = new CheerioCrawler({
async requestHandler({ request, $ }) {
const title = $('title').text();
await Dataset.pushData({ url: request.url, title });
},
});
await crawler.run(['https://example.com']);// Browser crawler (for JS-rendered pages)
import { PlaywrightCrawler } from 'crawlee';
const crawler = new PlaywrightCrawler({
async requestHandler({ page }) {
await page.waitForSelector('.product');
const items = await page.$$eval('.product', els =>
els.map(el => ({ name: el.textContent }))
);
},
});2. Anti-Bot Features
Built-in fingerprint randomization and session management:
const crawler = new PlaywrightCrawler({
useSessionPool: true,
sessionPoolOptions: { maxPoolSize: 100 },
browserPoolOptions: {
fingerprintOptions: {
fingerprintGeneratorOptions: {
browsers: ['chrome', 'firefox'],
},
},
},
});3. Proxy Rotation
import { ProxyConfiguration } from 'crawlee';
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
'http://proxy1:8080',
'http://proxy2:8080',
],
});
const crawler = new CheerioCrawler({
proxyConfiguration,
// Automatically rotates per request
});4. Auto-Scaling
Adjusts concurrency based on system resources and target site response:
const crawler = new CheerioCrawler({
minConcurrency: 1,
maxConcurrency: 100,
// Auto-scales between these limits
});5. Built-in Storage
// Dataset for structured data
await Dataset.pushData({ title, price, url });
await Dataset.exportToCSV('results');
// Key-value store for files
await KeyValueStore.setValue('screenshot', buffer, { contentType: 'image/png' });
// Request queue for URLs
await RequestQueue.addRequest({ url: 'https://...' });Python Version
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
crawler = PlaywrightCrawler()
@crawler.router.default_handler
async def handler(context: PlaywrightCrawlingContext):
title = await context.page.title()
await context.push_data({'title': title})
await crawler.run(['https://example.com'])FAQ
Q: How does it compare to Scrapy? A: Crawlee has first-class browser support, built-in anti-bot features, and works in both JS and Python. Scrapy is Python-only and HTTP-focused.
Q: Is it from the Apify team? A: Yes, Crawlee is open-source by Apify. It can run standalone or deploy to Apify cloud.
Q: Can it handle SPAs? A: Yes, PlaywrightCrawler renders JavaScript and waits for dynamic content.