# Apify Actor SDK — Headless Web Automation at Cloud Scale > The Apify SDK turns a Crawlee/Playwright script into a managed cloud Actor. Auto-retries, proxy rotation, dataset storage, request queue out of the box. ## Install Save as a script file and run: ## Quick Use 1. `npx apify-cli create my-scraper` (pick a template) 2. Edit `src/main.ts` to define your crawl logic 3. `apify run` locally; `apify push` to deploy to Apify Cloud --- ## Intro The Apify SDK packages a Crawlee or Playwright script into an Apify Actor — a containerized program that runs on Apify's cloud with built-in proxy rotation, retries, dataset persistence, request queue, and scheduling. You write the scraping logic; the SDK handles the infra. Best for: production scrapers / browser-automation agents that need reliability + observability without rolling your own queue. Works with: Node 20+, Python 3.10+. Setup time: 5 minutes (`npx apify-cli create`). --- ### Scaffold an Actor ```bash npx apify-cli create my-scraper cd my-scraper # Pick the "Crawlee + PlaywrightCrawler" template ``` ### Write the scraping logic ```typescript // src/main.ts import { Actor, log } from "apify"; import { PlaywrightCrawler } from "crawlee"; await Actor.init(); const { startUrls, maxRequests = 100 } = (await Actor.getInput<{ startUrls: string[]; maxRequests?: number; }>())!; const crawler = new PlaywrightCrawler({ maxRequestsPerCrawl: maxRequests, async requestHandler({ page, request, enqueueLinks }) { log.info(`Crawling ${request.url}`); const title = await page.title(); const content = await page.locator("article").textContent(); await Actor.pushData({ url: request.url, title, content: content?.slice(0, 5000), }); await enqueueLinks({ globs: [`${request.url}**`] }); }, }); await crawler.run(startUrls); await Actor.exit(); ``` ### Run locally vs in cloud ```bash # Locally apify run # Push to Apify cloud apify push # Run via API curl -X POST "https://api.apify.com/v2/acts//runs?token=$APIFY_TOKEN" \ -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }' ``` ### Use the dataset ```bash # Fetch results as JSON / CSV / XLSX curl "https://api.apify.com/v2/acts//runs/last/dataset/items?format=json" ``` Apify Actors auto-paginate, retry failed pages, rotate proxies, and persist crawl state across runs. The 4,000+ Apify Store Actors are built on this same SDK. --- ### FAQ **Q: Is Apify free?** A: Yes — Crawlee (the underlying library) is Apache-2.0 open-source. The Apify cloud has a free tier ($5/mo platform credit) and paid plans for production. Self-hosting Crawlee on your own infra is fully free. **Q: Crawlee vs Apify SDK?** A: Crawlee is the standalone scraping library (Apache-2.0). The Apify SDK wraps Crawlee with cloud features (Actor.init, getInput, pushData, proxy config). For local-only scrapers, just use Crawlee. **Q: Can I publish my Actor to the Apify Store?** A: Yes — actors can be public (free or paid usage-based) on apify.com/store. Apify takes a cut of paid usage and handles billing. Many Apify Store Actors are run by AI agents via the API. --- ## Source & Thanks > Built by [Apify](https://github.com/apify). Licensed under Apache-2.0 (Crawlee). > > [apify/apify-sdk-js](https://github.com/apify/apify-sdk-js) — ⭐ Active --- ## 快速使用 1. `npx apify-cli create my-scraper`(选模板) 2. 编辑 `src/main.ts` 写抓取逻辑 3. `apify run` 本地;`apify push` 推到 Apify 云 --- ## 简介 Apify SDK 把一段 Crawlee 或 Playwright 脚本打成一个 Apify Actor —— 在 Apify 云上跑的容器化程序,自带代理轮换、重试、数据集持久化、请求队列、定时任务。你写抓取逻辑,SDK 处理基础设施。适合需要可靠性 + 可观测性、又不想自己撸队列的生产抓取器 / 浏览器自动化 agent。需要 Node 20+ 或 Python 3.10+。装机时间 5 分钟(`npx apify-cli create`)。 --- ### 起一个 Actor ```bash npx apify-cli create my-scraper cd my-scraper # 选「Crawlee + PlaywrightCrawler」模板 ``` ### 写抓取逻辑 ```typescript // src/main.ts import { Actor, log } from "apify"; import { PlaywrightCrawler } from "crawlee"; await Actor.init(); const { startUrls, maxRequests = 100 } = (await Actor.getInput<{ startUrls: string[]; maxRequests?: number; }>())!; const crawler = new PlaywrightCrawler({ maxRequestsPerCrawl: maxRequests, async requestHandler({ page, request, enqueueLinks }) { log.info(`Crawling ${request.url}`); const title = await page.title(); const content = await page.locator("article").textContent(); await Actor.pushData({ url: request.url, title, content: content?.slice(0, 5000), }); await enqueueLinks({ globs: [`${request.url}**`] }); }, }); await crawler.run(startUrls); await Actor.exit(); ``` ### 本地跑 vs 云端跑 ```bash # 本地 apify run # 推到 Apify 云 apify push # API 调用 curl -X POST "https://api.apify.com/v2/acts//runs?token=$APIFY_TOKEN" \ -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }' ``` ### 用数据集 ```bash # 拿结果 JSON / CSV / XLSX curl "https://api.apify.com/v2/acts//runs/last/dataset/items?format=json" ``` Apify Actor 自动分页、重试失败页、轮换代理、跨运行保留爬取状态。Apify Store 的 4000+ Actor 都基于同一套 SDK。 --- ### FAQ **Q: Apify 免费吗?** A: 免费 —— Crawlee(底层库)Apache-2.0 开源。Apify 云有免费档(每月 $5 平台 credit)和生产付费档。在自己基础设施上自托管 Crawlee 完全免费。 **Q: Crawlee 跟 Apify SDK 啥区别?** A: Crawlee 是独立的抓取库(Apache-2.0)。Apify SDK 在 Crawlee 之上包了云功能(Actor.init / getInput / pushData / 代理配置)。只本地跑的话直接用 Crawlee。 **Q: 我的 Actor 能发到 Apify Store 吗?** A: 能 —— Actor 可以公开(免费或按用量付费)在 apify.com/store。Apify 抽付费用量的成,处理计费。Store 上很多 Actor 都被 AI agent 通过 API 调用。 --- ## 来源与感谢 > Built by [Apify](https://github.com/apify). Licensed under Apache-2.0 (Crawlee). > > [apify/apify-sdk-js](https://github.com/apify/apify-sdk-js) — ⭐ Active --- Source: https://tokrepo.com/en/workflows/apify-actor-sdk-headless-web-automation-at-cloud-scale Author: Apify