ScriptsApr 7, 2026·1 min read

Crawlee — Web Scraping and Browser Automation Library

Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.

TL;DR
Crawlee handles proxy rotation, fingerprints, and anti-bot bypassing so you build reliable scrapers fast.
§01

What it is

Crawlee is a web scraping and browser automation library for Node.js and Python. It handles the hard parts of web scraping: proxy rotation, browser fingerprints, automatic retries, request queuing, auto-scaling, and anti-bot bypassing. Crawlee supports HTTP crawling (Cheerio/BeautifulSoup), headless browsers (Playwright/Puppeteer), and adaptive switching between modes.

Crawlee targets developers building production web scrapers who need reliability and scale. Instead of writing retry logic, proxy management, and fingerprint rotation from scratch, Crawlee provides these as built-in features.

§02

How it saves time or tokens

Building a reliable web scraper means handling rate limiting, CAPTCHAs, IP blocks, JavaScript rendering, and data extraction. Crawlee bundles all of these concerns into a single library. The auto-scaling feature adjusts concurrency based on system resources and target server response times. Proxy rotation and browser fingerprint management reduce blocks without custom code.

§03

How to use

  1. Create a new scraper:
npx crawlee create my-scraper
cd my-scraper
npm start
  1. Or install manually:
npm install crawlee playwright
  1. Python version:
pip install crawlee[playwright]
§04

Example

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: 100,
  async requestHandler({ page, request, enqueueLinks }) {
    const title = await page.title();
    const price = await page.$eval('.price', el => el.textContent);

    console.log(`${request.url}: ${title} - ${price}`);

    // Follow pagination links
    await enqueueLinks({
      selector: '.pagination a',
      strategy: 'same-domain',
    });
  },
});

await crawler.run(['https://example.com/products']);
# Python version
from crawlee.playwright_crawler import PlaywrightCrawler

crawler = PlaywrightCrawler(max_requests_per_crawl=100)

@crawler.router.default_handler
async def handler(context):
    title = await context.page.title()
    context.log.info(f'{context.request.url}: {title}')
    await context.enqueue_links(strategy='same-domain')

await crawler.run(['https://example.com'])
§05

Related on TokRepo

This tool integrates with standard development workflows and requires minimal configuration to get started. It is available as open-source software with documentation and community support through the official repository. The project follows semantic versioning for stable releases.

For teams evaluating this tool, the key advantage is reducing manual work in repetitive tasks. The automation provided by the built-in features means less custom code to maintain and fewer integration points to manage. This translates directly to lower maintenance costs and faster iteration cycles.

§06

Common pitfalls

  • PlaywrightCrawler launches real browsers which consume significant memory; use CheerioCrawler for pages that do not require JavaScript rendering.
  • Proxy rotation requires proxy URLs configured in the crawler options; Crawlee does not provide proxies, only the rotation logic.
  • Respect robots.txt and website terms of service; Crawlee provides the technical capability but compliance is your responsibility.

Frequently Asked Questions

What is the difference between CheerioCrawler and PlaywrightCrawler?+

CheerioCrawler makes HTTP requests and parses HTML with Cheerio (no browser). PlaywrightCrawler launches a headless browser for pages that require JavaScript rendering. Use CheerioCrawler when possible for better performance and lower resource usage.

Does Crawlee handle anti-bot protection?+

Yes. Crawlee includes browser fingerprint rotation, request header randomization, and session management to reduce detection. For advanced anti-bot systems, combine with proxy rotation and human-like browsing patterns.

Does Crawlee support Python?+

Yes. Crawlee has official Python support with the same features as the Node.js version. Install with pip install crawlee[playwright] for browser-based scraping or crawlee[beautifulsoup] for HTTP scraping.

Can Crawlee scale to large websites?+

Yes. Crawlee includes auto-scaling that adjusts concurrency based on system resources and server response times. The request queue handles millions of URLs with automatic deduplication and retry logic.

Is Crawlee free?+

Yes. Crawlee is open-source under the Apache 2.0 license. It is developed by Apify but can be used independently without an Apify account. Apify offers a managed platform for running Crawlee scrapers in the cloud.

Citations (3)
🙏

Source & Thanks

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets