# Crawlee — Web Scraping and Browser Automation Library

> Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

```bash
npx crawlee create my-scraper
cd my-scraper
npm start
```

Or in Python:

```bash
pip install crawlee[playwright]
```

## What is Crawlee?

Crawlee is a web scraping and browser automation library that handles the hard parts — proxy rotation, browser fingerprints, retries, auto-scaling, and storage — so you can focus on the extraction logic. Available for Node.js and Python.

**Answer-Ready**: Crawlee is a web scraping library for Node.js and Python that handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing for reliable data extraction.

## Core Features

### 1. Multiple Crawler Types

```typescript
// HTTP crawler (fastest, for simple pages)
import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    async requestHandler({ request, $ }) {
        const title = $('title').text();
        await Dataset.pushData({ url: request.url, title });
    },
});

await crawler.run(['https://example.com']);
```

```typescript
// Browser crawler (for JS-rendered pages)
import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page }) {
        await page.waitForSelector('.product');
        const items = await page.$$eval('.product', els =>
            els.map(el => ({ name: el.textContent }))
        );
    },
});
```

### 2. Anti-Bot Features
Built-in fingerprint randomization and session management:

```typescript
const crawler = new PlaywrightCrawler({
    useSessionPool: true,
    sessionPoolOptions: { maxPoolSize: 100 },
    browserPoolOptions: {
        fingerprintOptions: {
            fingerprintGeneratorOptions: {
                browsers: ['chrome', 'firefox'],
            },
        },
    },
});
```

### 3. Proxy Rotation

```typescript
import { ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://proxy1:8080',
        'http://proxy2:8080',
    ],
});

const crawler = new CheerioCrawler({
    proxyConfiguration,
    // Automatically rotates per request
});
```

### 4. Auto-Scaling
Adjusts concurrency based on system resources and target site response:

```typescript
const crawler = new CheerioCrawler({
    minConcurrency: 1,
    maxConcurrency: 100,
    // Auto-scales between these limits
});
```

### 5. Built-in Storage

```typescript
// Dataset for structured data
await Dataset.pushData({ title, price, url });
await Dataset.exportToCSV('results');

// Key-value store for files
await KeyValueStore.setValue('screenshot', buffer, { contentType: 'image/png' });

// Request queue for URLs
await RequestQueue.addRequest({ url: 'https://...' });
```

## Python Version

```python
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

crawler = PlaywrightCrawler()

@crawler.router.default_handler
async def handler(context: PlaywrightCrawlingContext):
    title = await context.page.title()
    await context.push_data({'title': title})

await crawler.run(['https://example.com'])
```

## FAQ

**Q: How does it compare to Scrapy?**
A: Crawlee has first-class browser support, built-in anti-bot features, and works in both JS and Python. Scrapy is Python-only and HTTP-focused.

**Q: Is it from the Apify team?**
A: Yes, Crawlee is open-source by Apify. It can run standalone or deploy to Apify cloud.

**Q: Can it handle SPAs?**
A: Yes, PlaywrightCrawler renders JavaScript and waits for dynamic content.

## Source & Thanks

- GitHub: [apify/crawlee](https://github.com/apify/crawlee) (16k+ stars)
- Docs: [crawlee.dev](https://crawlee.dev)

<!-- ZH -->


## Quick Start

```bash
npx crawlee create my-scraper
```

One command creates a scraper project with built-in proxy rotation and anti-detection.

## What is Crawlee?

Crawlee is a Node.js/Python web scraping library that automatically handles proxy rotation, browser fingerprinting, retries, auto-scaling, and data storage.

**In one sentence**: Crawlee is a web scraping library for Node.js and Python with built-in proxy rotation, anti-detection, and auto-scaling.

## Core Features

### 1. Multiple Crawler Types
HTTP crawlers (fast) and browser crawlers (JS rendering).

### 2. Anti-Detection
Built-in browser fingerprint randomization and session management.

### 3. Proxy Rotation
Automatic per-request proxy rotation.

### 4. Auto-Scaling
Adjusts concurrency based on system resources and target site response.

### 5. Built-In Storage
Structured datasets, key-value stores, and request queues.

## FAQ

**Q: How does it compare to Scrapy?**
A: Crawlee has native browser support, built-in anti-detection, and works in both JS and Python. Scrapy is Python-only and primarily HTTP-based.

## Source & Thanks

- GitHub: [apify/crawlee](https://github.com/apify/crawlee) (16k+ stars)

---
Source: https://tokrepo.com/en/workflows/crawlee-web-scraping-browser-automation-library-8f2c0ae9
Author: Apify