What is Crawlee — Web Scraping and Browser Automation Library?

Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.

Is Crawlee — Web Scraping and Browser Automation Library free to use?

Yes. Crawlee — Web Scraping and Browser Automation Library is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Crawlee — Web Scraping and Browser Automation Library?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Crawlee — Web Scraping and Browser Automation Library

What is Crawlee?

Crawlee is a web scraping and browser automation library that handles the hard parts — proxy rotation, browser fingerprints, retries, auto-scaling, and storage — so you can focus on the extraction logic. Available for Node.js and Python.

Answer-Ready: Crawlee is a web scraping library for Node.js and Python that handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing for reliable data extraction.

Core Features

1. Multiple Crawler Types

// HTTP crawler (fastest, for simple pages)
import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $ }) {
        const title = $('title').text();
        await Dataset.pushData({ url: request.url, title });
    },
});

await crawler.run(['https://example.com']);

// Browser crawler (for JS-rendered pages)
import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page }) {
        await page.waitForSelector('.product');
        const items = await page.$$eval('.product', els =>
            els.map(el => ({ name: el.textContent }))
        );
    },
});

2. Anti-Bot Features

Built-in fingerprint randomization and session management:

const crawler = new PlaywrightCrawler({
    useSessionPool: true,
    sessionPoolOptions: { maxPoolSize: 100 },
    browserPoolOptions: {
        fingerprintOptions: {
            fingerprintGeneratorOptions: {
                browsers: ['chrome', 'firefox'],
            },
        },
    },
});

3. Proxy Rotation

import { ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://proxy1:8080',
        'http://proxy2:8080',
    ],
});

const crawler = new CheerioCrawler({
    proxyConfiguration,
    // Automatically rotates per request
});

4. Auto-Scaling

Adjusts concurrency based on system resources and target site response:

const crawler = new CheerioCrawler({
    minConcurrency: 1,
    maxConcurrency: 100,
    // Auto-scales between these limits
});

5. Built-in Storage

// Dataset for structured data
await Dataset.pushData({ title, price, url });
await Dataset.exportToCSV('results');

// Key-value store for files
await KeyValueStore.setValue('screenshot', buffer, { contentType: 'image/png' });

// Request queue for URLs
await RequestQueue.addRequest({ url: 'https://...' });

Python Version

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

crawler = PlaywrightCrawler()

@crawler.router.default_handler
async def handler(context: PlaywrightCrawlingContext):
    title = await context.page.title()
    await context.push_data({'title': title})

await crawler.run(['https://example.com'])

FAQ

Q: How does it compare to Scrapy? A: Crawlee has first-class browser support, built-in anti-bot features, and works in both JS and Python. Scrapy is Python-only and HTTP-focused.

Q: Is it from the Apify team? A: Yes, Crawlee is open-source by Apify. It can run standalone or deploy to Apify cloud.

Q: Can it handle SPAs? A: Yes, PlaywrightCrawler renders JavaScript and waits for dynamic content.

Crawlee — Web Scraping and Browser Automation Library

What is Crawlee?

Core Features

1. Multiple Crawler Types

2. Anti-Bot Features

3. Proxy Rotation

4. Auto-Scaling

5. Built-in Storage

Python Version

FAQ

Source et remerciements

Discussion

Actifs similaires

Unkey — Open-Source API Key Management Platform

Flagsmith — Open-Source Feature Flags and Remote Config

OpenStatus — Open-Source Monitoring and Status Page Platform