What is Crawlee — Production Web Scraping for Node.js?

Build reliable crawlers with automatic proxy rotation, request queuing, and browser automation. By Apify. 22K+ stars.

Is Crawlee — Production Web Scraping for Node.js free to use?

Yes. Crawlee — Production Web Scraping for Node.js is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Crawlee — Production Web Scraping for Node.js?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Crawlee — Production Web Scraping for Node.js

Crawlee Crawler Types & Features

Three Crawler Types

Crawler	Engine	Best For	Speed
CheerioCrawler	HTTP + Cheerio	Static HTML pages	Fastest
PlaywrightCrawler	Playwright browser	JavaScript-heavy SPAs	Medium
PuppeteerCrawler	Puppeteer browser	Chrome-specific features	Medium
AdaptivePlaywrightCrawler	Auto-switching	Mixed content sites	Smart

CheerioCrawler (Fast HTTP)

For static pages — no browser overhead:

import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $ }) {
        const title = $('h1').text();
        const prices = $('span.price').map((_, el) => $(el).text()).get();
        await Dataset.pushData({ url: request.url, title, prices });
    },
});

PlaywrightCrawler (Browser)

For JavaScript-rendered content:

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    headless: true,
    async requestHandler({ page, request }) {
        // Wait for dynamic content
        await page.waitForSelector('.product-list');

        // Scroll to load more
        await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
        await page.waitForTimeout(2000);

        const products = await page.$$eval('.product', items =>
            items.map(item => ({
                name: item.querySelector('.name')?.textContent,
                price: item.querySelector('.price')?.textContent,
            }))
        );
    },
});

Proxy Rotation

Built-in proxy management with session persistence:

import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://proxy1:8080',
        'http://proxy2:8080',
        'http://proxy3:8080',
    ],
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,
    sessionPoolOptions: { maxPoolSize: 100 },
});

Request Queue & Auto-Retry

Persistent queue survives crashes, with configurable retry logic:

const crawler = new PlaywrightCrawler({
    maxRequestRetries: 3,
    requestHandlerTimeoutSecs: 60,
    maxConcurrency: 10,
    async requestHandler({ request }) { /* ... */ },
    async failedRequestHandler({ request }) {
        console.log(`Failed after retries: ${request.url}`);
    },
});

Dataset Storage

Structured data export without external dependencies:

import { Dataset } from 'crawlee';

// Save data
await Dataset.pushData({ title: 'Product A', price: '$29.99' });

// Export to JSON/CSV
const dataset = await Dataset.open();
await dataset.exportToJSON('output.json');
await dataset.exportToCSV('output.csv');

AI/LLM Integration

Feed crawled data directly to AI pipelines:

const crawler = new CheerioCrawler({
    async requestHandler({ $, request }) {
        // Extract clean text for LLM consumption
        $('nav, footer, script, style').remove();
        const cleanText = $('body').text().replace(/\s+/g, ' ').trim();

        await Dataset.pushData({
            url: request.url,
            content: cleanText,
            // Ready for RAG ingestion
        });
    },
});

FAQ

Q: What is Crawlee? A: Crawlee is a Node.js/TypeScript web scraping and browser automation library by Apify with 22,600+ GitHub stars. It provides HTTP and browser-based crawlers with built-in proxy rotation, request queuing, and auto-retries for production use.

Q: How is Crawlee different from Puppeteer or Playwright alone? A: Crawlee adds production features on top of Puppeteer/Playwright: request queuing, automatic retries, proxy rotation, session management, and structured storage. Raw Puppeteer/Playwright are browser automation tools; Crawlee is a complete crawling framework.

Q: Is Crawlee free? A: Yes, fully free and open-source under Apache-2.0. Apify offers optional cloud hosting for running crawlers at scale, but the library itself is completely free.

Crawlee — Production Web Scraping for Node.js

Use it first, then decide how deep to go

Crawlee Crawler Types & Features

Three Crawler Types

CheerioCrawler (Fast HTTP)

PlaywrightCrawler (Browser)

Proxy Rotation

Request Queue & Auto-Retry

Dataset Storage

AI/LLM Integration

FAQ

Source & Thanks

Discussion

Related Assets

Pydantic — Data Validation for AI Agent Pipelines

Open WebUI — Self-Hosted ChatGPT Alternative

Docusaurus — Build AI Tool Documentation Sites