Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsApr 7, 2026·1 min de lecture

Crawlee — Web Scraping and Browser Automation Library

Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.

What is Crawlee?

Crawlee is a web scraping and browser automation library that handles the hard parts — proxy rotation, browser fingerprints, retries, auto-scaling, and storage — so you can focus on the extraction logic. Available for Node.js and Python.

Answer-Ready: Crawlee is a web scraping library for Node.js and Python that handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing for reliable data extraction.

Core Features

1. Multiple Crawler Types

// HTTP crawler (fastest, for simple pages)
import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $ }) {
        const title = $('title').text();
        await Dataset.pushData({ url: request.url, title });
    },
});

await crawler.run(['https://example.com']);
// Browser crawler (for JS-rendered pages)
import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page }) {
        await page.waitForSelector('.product');
        const items = await page.$$eval('.product', els =>
            els.map(el => ({ name: el.textContent }))
        );
    },
});

2. Anti-Bot Features

Built-in fingerprint randomization and session management:

const crawler = new PlaywrightCrawler({
    useSessionPool: true,
    sessionPoolOptions: { maxPoolSize: 100 },
    browserPoolOptions: {
        fingerprintOptions: {
            fingerprintGeneratorOptions: {
                browsers: ['chrome', 'firefox'],
            },
        },
    },
});

3. Proxy Rotation

import { ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://proxy1:8080',
        'http://proxy2:8080',
    ],
});

const crawler = new CheerioCrawler({
    proxyConfiguration,
    // Automatically rotates per request
});

4. Auto-Scaling

Adjusts concurrency based on system resources and target site response:

const crawler = new CheerioCrawler({
    minConcurrency: 1,
    maxConcurrency: 100,
    // Auto-scales between these limits
});

5. Built-in Storage

// Dataset for structured data
await Dataset.pushData({ title, price, url });
await Dataset.exportToCSV('results');

// Key-value store for files
await KeyValueStore.setValue('screenshot', buffer, { contentType: 'image/png' });

// Request queue for URLs
await RequestQueue.addRequest({ url: 'https://...' });

Python Version

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

crawler = PlaywrightCrawler()

@crawler.router.default_handler
async def handler(context: PlaywrightCrawlingContext):
    title = await context.page.title()
    await context.push_data({'title': title})

await crawler.run(['https://example.com'])

FAQ

Q: How does it compare to Scrapy? A: Crawlee has first-class browser support, built-in anti-bot features, and works in both JS and Python. Scrapy is Python-only and HTTP-focused.

Q: Is it from the Apify team? A: Yes, Crawlee is open-source by Apify. It can run standalone or deploy to Apify cloud.

Q: Can it handle SPAs? A: Yes, PlaywrightCrawler renders JavaScript and waits for dynamic content.

🙏

Source et remerciements

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires