What is Crawlee — Production Web Scraping for Node.js?

Build reliable crawlers with automatic proxy rotation, request queuing, and browser automation. By Apify. 22K+ stars.

Is Crawlee — Production Web Scraping for Node.js free to use?

Yes. Crawlee — Production Web Scraping for Node.js is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Crawlee — Production Web Scraping for Node.js?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Crawlee — Production Web Scraping for Node.js

import { PlaywrightCrawler } from 'crawlee'; const crawler = new PlaywrightCrawler({ async requestHandler({ request, page, enqueueLinks }) { const title = await page.title(); console.log(`${title} — ${request.url}`); await enqueueLinks({ globs: ['https://example.com/blog/**'] }); }, }); await crawler.run(['https://example.com/blog']);

简介

Crawlee 是 Apify 团队打造的 Node.js 网页抓取和浏览器自动化库，拥有 22,600+ GitHub stars。提供统一接口构建生产级爬虫，支持 HTTP 请求（Cheerio）、无头浏览器（Playwright/Puppeteer）和自适应爬取。内置代理轮转、请求队列、自动重试和持久化存储，是为 AI/LLM 管线准备数据的理想工具。

适用于：Node.js、TypeScript、Playwright、Puppeteer、Cheerio。适合为 AI 应用构建数据管线的开发者。

核心功能

三种爬虫类型

CheerioCrawler — 纯 HTTP 请求，速度最快，适合静态页面
PlaywrightCrawler — 浏览器渲染，适合 JS 重度单页应用
AdaptivePlaywrightCrawler — 自动在 HTTP 和浏览器间切换

代理轮转

内置代理管理和会话持久化，防止 IP 封禁。

请求队列

持久化队列，崩溃后可恢复，支持可配置重试策略。

结构化存储

无需外部依赖的数据集存储，支持 JSON/CSV 导出。

AI/LLM 集成

提取干净文本，直接输入 RAG 管线或 LLM 训练数据集。

Crawlee — Production Web Scraping for Node.js

先拿来用，再决定要不要深挖

简介

核心功能

三种爬虫类型

代理轮转

请求队列

结构化存储

AI/LLM 集成

来源与感谢

讨论

相关资产

Pydantic — Data Validation for AI Agent Pipelines

Open WebUI — Self-Hosted ChatGPT Alternative

Docusaurus — Build AI Tool Documentation Sites