ScriptsApr 7, 2026·1 min read

Crawlee — Web Scraping and Browser Automation Library

Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.

MC
MCP Hub · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

npx crawlee create my-scraper
cd my-scraper
npm start

Or in Python:

pip install crawlee[playwright]

What is Crawlee?

Crawlee is a web scraping and browser automation library that handles the hard parts — proxy rotation, browser fingerprints, retries, auto-scaling, and storage — so you can focus on the extraction logic. Available for Node.js and Python.

Answer-Ready: Crawlee is a web scraping library for Node.js and Python that handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing for reliable data extraction.

Core Features

1. Multiple Crawler Types

// HTTP crawler (fastest, for simple pages)
import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $ }) {
        const title = $('title').text();
        await Dataset.pushData({ url: request.url, title });
    },
});

await crawler.run(['https://example.com']);
// Browser crawler (for JS-rendered pages)
import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page }) {
        await page.waitForSelector('.product');
        const items = await page.$$eval('.product', els =>
            els.map(el => ({ name: el.textContent }))
        );
    },
});

2. Anti-Bot Features

Built-in fingerprint randomization and session management:

const crawler = new PlaywrightCrawler({
    useSessionPool: true,
    sessionPoolOptions: { maxPoolSize: 100 },
    browserPoolOptions: {
        fingerprintOptions: {
            fingerprintGeneratorOptions: {
                browsers: ['chrome', 'firefox'],
            },
        },
    },
});

3. Proxy Rotation

import { ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://proxy1:8080',
        'http://proxy2:8080',
    ],
});

const crawler = new CheerioCrawler({
    proxyConfiguration,
    // Automatically rotates per request
});

4. Auto-Scaling

Adjusts concurrency based on system resources and target site response:

const crawler = new CheerioCrawler({
    minConcurrency: 1,
    maxConcurrency: 100,
    // Auto-scales between these limits
});

5. Built-in Storage

// Dataset for structured data
await Dataset.pushData({ title, price, url });
await Dataset.exportToCSV('results');

// Key-value store for files
await KeyValueStore.setValue('screenshot', buffer, { contentType: 'image/png' });

// Request queue for URLs
await RequestQueue.addRequest({ url: 'https://...' });

Python Version

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

crawler = PlaywrightCrawler()

@crawler.router.default_handler
async def handler(context: PlaywrightCrawlingContext):
    title = await context.page.title()
    await context.push_data({'title': title})

await crawler.run(['https://example.com'])

FAQ

Q: How does it compare to Scrapy? A: Crawlee has first-class browser support, built-in anti-bot features, and works in both JS and Python. Scrapy is Python-only and HTTP-focused.

Q: Is it from the Apify team? A: Yes, Crawlee is open-source by Apify. It can run standalone or deploy to Apify cloud.

Q: Can it handle SPAs? A: Yes, PlaywrightCrawler renders JavaScript and waits for dynamic content.

🙏

Source & Thanks

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets