# Crawlee — Web Scraping and Browser Automation Library > Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use ```bash npx crawlee create my-scraper cd my-scraper npm start ``` Or in Python: ```bash pip install crawlee[playwright] ``` ## What is Crawlee? Crawlee is a web scraping and browser automation library that handles the hard parts — proxy rotation, browser fingerprints, retries, auto-scaling, and storage — so you can focus on the extraction logic. Available for Node.js and Python. **Answer-Ready**: Crawlee is a web scraping library for Node.js and Python that handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing for reliable data extraction. ## Core Features ### 1. Multiple Crawler Types ```typescript // HTTP crawler (fastest, for simple pages) import { CheerioCrawler } from 'crawlee'; const crawler = new CheerioCrawler({ async requestHandler({ request, $ }) { const title = $('title').text(); await Dataset.pushData({ url: request.url, title }); }, }); await crawler.run(['https://example.com']); ``` ```typescript // Browser crawler (for JS-rendered pages) import { PlaywrightCrawler } from 'crawlee'; const crawler = new PlaywrightCrawler({ async requestHandler({ page }) { await page.waitForSelector('.product'); const items = await page.$$eval('.product', els => els.map(el => ({ name: el.textContent })) ); }, }); ``` ### 2. Anti-Bot Features Built-in fingerprint randomization and session management: ```typescript const crawler = new PlaywrightCrawler({ useSessionPool: true, sessionPoolOptions: { maxPoolSize: 100 }, browserPoolOptions: { fingerprintOptions: { fingerprintGeneratorOptions: { browsers: ['chrome', 'firefox'], }, }, }, }); ``` ### 3. Proxy Rotation ```typescript import { ProxyConfiguration } from 'crawlee'; const proxyConfiguration = new ProxyConfiguration({ proxyUrls: [ 'http://proxy1:8080', 'http://proxy2:8080', ], }); const crawler = new CheerioCrawler({ proxyConfiguration, // Automatically rotates per request }); ``` ### 4. Auto-Scaling Adjusts concurrency based on system resources and target site response: ```typescript const crawler = new CheerioCrawler({ minConcurrency: 1, maxConcurrency: 100, // Auto-scales between these limits }); ``` ### 5. Built-in Storage ```typescript // Dataset for structured data await Dataset.pushData({ title, price, url }); await Dataset.exportToCSV('results'); // Key-value store for files await KeyValueStore.setValue('screenshot', buffer, { contentType: 'image/png' }); // Request queue for URLs await RequestQueue.addRequest({ url: 'https://...' }); ``` ## Python Version ```python from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext crawler = PlaywrightCrawler() @crawler.router.default_handler async def handler(context: PlaywrightCrawlingContext): title = await context.page.title() await context.push_data({'title': title}) await crawler.run(['https://example.com']) ``` ## FAQ **Q: How does it compare to Scrapy?** A: Crawlee has first-class browser support, built-in anti-bot features, and works in both JS and Python. Scrapy is Python-only and HTTP-focused. **Q: Is it from the Apify team?** A: Yes, Crawlee is open-source by Apify. It can run standalone or deploy to Apify cloud. **Q: Can it handle SPAs?** A: Yes, PlaywrightCrawler renders JavaScript and waits for dynamic content. ## Source & Thanks - GitHub: [apify/crawlee](https://github.com/apify/crawlee) (16k+ stars) - Docs: [crawlee.dev](https://crawlee.dev) ## Quick Start ```bash npx crawlee create my-scraper ``` One command creates a scraper project with built-in proxy rotation and anti-detection. ## What is Crawlee? Crawlee is a Node.js/Python web scraping library that automatically handles proxy rotation, browser fingerprinting, retries, auto-scaling, and data storage. **In one sentence**: Crawlee is a web scraping library for Node.js and Python with built-in proxy rotation, anti-detection, and auto-scaling. ## Core Features ### 1. Multiple Crawler Types HTTP crawlers (fast) and browser crawlers (JS rendering). ### 2. Anti-Detection Built-in browser fingerprint randomization and session management. ### 3. Proxy Rotation Automatic per-request proxy rotation. ### 4. Auto-Scaling Adjusts concurrency based on system resources and target site response. ### 5. Built-In Storage Structured datasets, key-value stores, and request queues. ## FAQ **Q: How does it compare to Scrapy?** A: Crawlee has native browser support, built-in anti-detection, and works in both JS and Python. Scrapy is Python-only and primarily HTTP-based. ## Source & Thanks - GitHub: [apify/crawlee](https://github.com/apify/crawlee) (16k+ stars) --- Source: https://tokrepo.com/en/workflows/crawlee-web-scraping-browser-automation-library-8f2c0ae9 Author: Apify