Is Crawlee — Production Web Scraping for Node.js free to use?

Yes. Crawlee — Production Web Scraping for Node.js is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Crawlee — Production Web Scraping for Node.js?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 2, 2026·2 min read

Crawlee — Production Web Scraping for Node.js

Build reliable crawlers with automatic proxy rotation, request queuing, and browser automation. By Apify. 22K+ stars.

Script Depot · Community

TL;DR

Crawlee is a Node.js web scraping library with proxy rotation, queuing, and adaptive crawling.

§01

What it is

Crawlee is a web scraping and browser automation library for Node.js built by Apify. It provides a unified interface for building production-grade crawlers using raw HTTP requests (Cheerio), headless browsers (Playwright or Puppeteer), or adaptive crawling that automatically switches between them.

Crawlee is designed for developers building data pipelines for AI and LLM systems, RAG applications, and training datasets. It handles proxy rotation, request queuing, automatic retries, and persistent storage so you can focus on data extraction logic.

§02

How it saves time or tokens

Crawlee eliminates boilerplate code for proxy management, retry logic, and request queuing that every production crawler needs. Its adaptive crawling mode automatically picks the cheapest method (raw HTTP with Cheerio) when JavaScript rendering is not needed, falling back to Playwright only when required. This reduces compute costs and speeds up crawls. The built-in request queue with deduplication prevents wasted requests on already-visited pages, and automatic fingerprint rotation reduces blocking rates.

§03

How to use

Create a new crawler project with the CLI scaffolding tool:

npx crawlee create my-crawler
cd my-crawler
npm start

Or add Crawlee to an existing project and write a crawler:

npm install crawlee playwright

Define your crawler with a request handler:

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks }) {
        const title = await page.title();
        console.log(`${title} - ${request.url}`);
        await enqueueLinks({ globs: ['https://example.com/blog/**'] });
    },
});

await crawler.run(['https://example.com/blog']);

§04

Example

Extracting structured data from product pages with Cheerio (no browser needed):

import { CheerioCrawler, Dataset } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $ }) {
        const products = [];
        $('div.product-card').each((_, el) => {
            products.push({
                name: $(el).find('h2').text().trim(),
                price: $(el).find('.price').text().trim(),
                url: request.url,
            });
        });
        await Dataset.pushData(products);
    },
    maxRequestsPerCrawl: 100,
});

await crawler.run(['https://shop.example.com/products']);

§05

Related on TokRepo

Web scraping tools — More web scraping and data extraction tools curated on TokRepo.
Automation tools — Browse automation frameworks for data pipelines and workflows.

§06

Common pitfalls

Using PlaywrightCrawler for every page wastes resources. Start with CheerioCrawler and only switch to browser-based crawling for JavaScript-heavy sites.
Not setting maxRequestsPerCrawl can cause runaway crawls that scrape far more pages than intended. Always set a limit during development.
Ignoring the built-in session pool leads to higher blocking rates. Enable session rotation when scraping sites with rate limits.

Frequently Asked Questions

What is the difference between CheerioCrawler and PlaywrightCrawler?+

CheerioCrawler makes raw HTTP requests and parses HTML with Cheerio (jQuery-like). It is faster and uses less memory but cannot handle JavaScript-rendered content. PlaywrightCrawler runs a full headless browser, handling SPAs, dynamic content, and infinite scroll pages.

Does Crawlee handle proxy rotation automatically?+

Yes. Crawlee has built-in proxy management that rotates proxies per request, handles proxy failures with automatic retries, and supports session-based proxy assignment. You provide a list of proxy URLs and Crawlee manages the rotation.

Can Crawlee be used to feed data into AI and LLM pipelines?+

Yes. Crawlee is commonly used to build data ingestion pipelines for RAG systems, training datasets, and LLM context windows. The extracted data can be stored as JSON, pushed to a database, or piped directly into embedding workflows.

How does adaptive crawling work in Crawlee?+

Adaptive crawling starts with CheerioCrawler (raw HTTP) and automatically detects when a page requires JavaScript rendering. It then switches to PlaywrightCrawler for those specific pages, keeping costs low while ensuring full coverage.

Is Crawlee related to Apify?+

Crawlee is built and maintained by Apify. It works standalone as an open-source library but can also deploy to the Apify cloud platform for managed infrastructure, scheduling, and proxy pools.

Citations (3)

Crawlee GitHub— Crawlee is built by Apify for production web scraping
Playwright Documentation— Playwright browser automation framework
Cheerio GitHub— Cheerio HTML parsing library for Node.js

Related on TokRepo

Web scraping tools Automation tools RAG tools

🙏

Source & Thanks

Created by Apify. Licensed under Apache-2.0.

crawlee — ⭐ 22,600+

Thanks to the Apify team for building the most robust open-source web scraping framework for Node.js.

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Kornia — Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library built on PyTorch that provides GPU-accelerated implementations of classical vision algorithms including geometric transforms, color conversions, filtering, feature detection, and augmentations, all with full autograd support for end-to-end learning.

Script Depot

AlphaFold — AI-Powered 3D Protein Structure Prediction

AlphaFold by Google DeepMind predicts three-dimensional protein structures from amino acid sequences with atomic-level accuracy, enabling breakthroughs in drug discovery, enzyme engineering, and structural biology research.

Script Depot

Flash Attention — Fast Memory-Efficient Exact Attention for Transformers

Flash Attention is a CUDA kernel library that computes exact scaled dot-product attention 2-4x faster and with up to 20x less memory than standard implementations by using IO-aware tiling to minimize GPU memory reads and writes.

Script Depot