Skills2026年4月7日·1 分钟阅读

Crawlee — Web Scraping and Browser Automation Library

Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.

Apify · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Community

入口

Crawlee — Web Scraping and Browser Automation Library

直接安装命令

npx -y tokrepo@latest install 8f2c0ae9-1327-481f-a519-d473751bdd76 --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

Crawlee handles proxy rotation, fingerprints, and anti-bot bypassing so you build reliable scrapers fast.

§01

What it is

Crawlee is a web scraping and browser automation library for Node.js and Python. It handles the hard parts of web scraping: proxy rotation, browser fingerprints, automatic retries, request queuing, auto-scaling, and anti-bot bypassing. Crawlee supports HTTP crawling (Cheerio/BeautifulSoup), headless browsers (Playwright/Puppeteer), and adaptive switching between modes.

Crawlee targets developers building production web scrapers who need reliability and scale. Instead of writing retry logic, proxy management, and fingerprint rotation from scratch, Crawlee provides these as built-in features.

§02

How it saves time or tokens

Building a reliable web scraper means handling rate limiting, CAPTCHAs, IP blocks, JavaScript rendering, and data extraction. Crawlee bundles all of these concerns into a single library. The auto-scaling feature adjusts concurrency based on system resources and target server response times. Proxy rotation and browser fingerprint management reduce blocks without custom code.

§03

How to use

Create a new scraper:

npx crawlee create my-scraper
cd my-scraper
npm start

Or install manually:

npm install crawlee playwright

Python version:

pip install crawlee[playwright]

§04

Example

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: 100,
  async requestHandler({ page, request, enqueueLinks }) {
    const title = await page.title();
    const price = await page.$eval('.price', el => el.textContent);

    console.log(`${request.url}: ${title} - ${price}`);

    // Follow pagination links
    await enqueueLinks({
      selector: '.pagination a',
      strategy: 'same-domain',
    });
  },
});

await crawler.run(['https://example.com/products']);

# Python version
from crawlee.playwright_crawler import PlaywrightCrawler

crawler = PlaywrightCrawler(max_requests_per_crawl=100)

@crawler.router.default_handler
async def handler(context):
    title = await context.page.title()
    context.log.info(f'{context.request.url}: {title}')
    await context.enqueue_links(strategy='same-domain')

await crawler.run(['https://example.com'])

§05

Related on TokRepo

Web Scraping Tools — Scraping and data extraction tools
Browser Automation — Automate browser interactions

This tool integrates with standard development workflows and requires minimal configuration to get started. It is available as open-source software with documentation and community support through the official repository. The project follows semantic versioning for stable releases.

For teams evaluating this tool, the key advantage is reducing manual work in repetitive tasks. The automation provided by the built-in features means less custom code to maintain and fewer integration points to manage. This translates directly to lower maintenance costs and faster iteration cycles.

§06

Common pitfalls

PlaywrightCrawler launches real browsers which consume significant memory; use CheerioCrawler for pages that do not require JavaScript rendering.
Proxy rotation requires proxy URLs configured in the crawler options; Crawlee does not provide proxies, only the rotation logic.
Respect robots.txt and website terms of service; Crawlee provides the technical capability but compliance is your responsibility.

常见问题

What is the difference between CheerioCrawler and PlaywrightCrawler?+

CheerioCrawler makes HTTP requests and parses HTML with Cheerio (no browser). PlaywrightCrawler launches a headless browser for pages that require JavaScript rendering. Use CheerioCrawler when possible for better performance and lower resource usage.

Does Crawlee handle anti-bot protection?+

Yes. Crawlee includes browser fingerprint rotation, request header randomization, and session management to reduce detection. For advanced anti-bot systems, combine with proxy rotation and human-like browsing patterns.

Does Crawlee support Python?+

Yes. Crawlee has official Python support with the same features as the Node.js version. Install with pip install crawlee[playwright] for browser-based scraping or crawlee[beautifulsoup] for HTTP scraping.

Can Crawlee scale to large websites?+

Yes. Crawlee includes auto-scaling that adjusts concurrency based on system resources and server response times. The request queue handles millions of URLs with automatic deduplication and retry logic.

Is Crawlee free?+

Yes. Crawlee is open-source under the Apache 2.0 license. It is developed by Apify but can be used independently without an Apify account. Apify offers a managed platform for running Crawlee scrapers in the cloud.

引用来源 (3)

Crawlee GitHub— Crawlee handles proxy rotation, fingerprints, and anti-bot bypassing
Crawlee Documentation— Crawlee supports Node.js and Python with Playwright and Cheerio/BeautifulSoup
Crawlee Official Site— Crawlee is open-source under Apache 2.0 by Apify

🙏

来源与感谢

GitHub: apify/crawlee (16k+ stars)

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Crawlee — Web Scraping and Browser Automation Library

Agent 可直接安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Obscura — Headless Browser Built for AI Agents and Web Scraping

Lightpanda — High-Performance Headless Browser for AI and Automation

Scrapling — Adaptive Web Scraping Framework for Python

ScrapeGraphAI — AI-Powered Web Scraping