# Apify Actor SDK — Headless Web Automation at Cloud Scale

> The Apify SDK turns a Crawlee/Playwright script into a managed cloud Actor. Auto-retries, proxy rotation, dataset storage, request queue out of the box.

## Install

Save as a script file and run:

## Quick Use

1. `npx apify-cli create my-scraper` (pick a template)
2. Edit `src/main.ts` to define your crawl logic
3. `apify run` locally; `apify push` to deploy to Apify Cloud

---

## Intro

The Apify SDK packages a Crawlee or Playwright script into an Apify Actor — a containerized program that runs on Apify's cloud with built-in proxy rotation, retries, dataset persistence, request queue, and scheduling. You write the scraping logic; the SDK handles the infra. Best for: production scrapers / browser-automation agents that need reliability + observability without rolling your own queue. Works with: Node 20+, Python 3.10+. Setup time: 5 minutes (`npx apify-cli create`).

---

### Scaffold an Actor

```bash
npx apify-cli create my-scraper
cd my-scraper
# Pick the "Crawlee + PlaywrightCrawler" template
```

### Write the scraping logic

```typescript
// src/main.ts
import { Actor, log } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const { startUrls, maxRequests = 100 } = (await Actor.getInput<{
  startUrls: string[];
  maxRequests?: number;
}>())!;

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: maxRequests,

  async requestHandler({ page, request, enqueueLinks }) {
    log.info(`Crawling ${request.url}`);

    const title = await page.title();
    const content = await page.locator("article").textContent();

    await Actor.pushData({
      url: request.url,
      title,
      content: content?.slice(0, 5000),
    });

    await enqueueLinks({ globs: [`${request.url}**`] });
  },
});

await crawler.run(startUrls);
await Actor.exit();
```

### Run locally vs in cloud

```bash
# Locally
apify run

# Push to Apify cloud
apify push

# Run via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \
  -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'
```

### Use the dataset

```bash
# Fetch results as JSON / CSV / XLSX
curl "https://api.apify.com/v2/acts/<ACTOR_ID>/runs/last/dataset/items?format=json"
```

Apify Actors auto-paginate, retry failed pages, rotate proxies, and persist crawl state across runs. The 4,000+ Apify Store Actors are built on this same SDK.

---

### FAQ

**Q: Is Apify free?**
A: Yes — Crawlee (the underlying library) is Apache-2.0 open-source. The Apify cloud has a free tier ($5/mo platform credit) and paid plans for production. Self-hosting Crawlee on your own infra is fully free.

**Q: Crawlee vs Apify SDK?**
A: Crawlee is the standalone scraping library (Apache-2.0). The Apify SDK wraps Crawlee with cloud features (Actor.init, getInput, pushData, proxy config). For local-only scrapers, just use Crawlee.

**Q: Can I publish my Actor to the Apify Store?**
A: Yes — actors can be public (free or paid usage-based) on apify.com/store. Apify takes a cut of paid usage and handles billing. Many Apify Store Actors are run by AI agents via the API.

---

## Source & Thanks

> Built by [Apify](https://github.com/apify). Licensed under Apache-2.0 (Crawlee).
>
> [apify/apify-sdk-js](https://github.com/apify/apify-sdk-js) — ⭐ Active

---

<!-- ZH -->

## 快速使用

1. `npx apify-cli create my-scraper`（选模板）
2. 编辑 `src/main.ts` 写抓取逻辑
3. `apify run` 本地；`apify push` 推到 Apify 云

---

## 简介

Apify SDK 把一段 Crawlee 或 Playwright 脚本打成一个 Apify Actor —— 在 Apify 云上跑的容器化程序，自带代理轮换、重试、数据集持久化、请求队列、定时任务。你写抓取逻辑，SDK 处理基础设施。适合需要可靠性 + 可观测性、又不想自己撸队列的生产抓取器 / 浏览器自动化 agent。需要 Node 20+ 或 Python 3.10+。装机时间 5 分钟（`npx apify-cli create`）。

---

### 起一个 Actor

```bash
npx apify-cli create my-scraper
cd my-scraper
# 选「Crawlee + PlaywrightCrawler」模板
```

### 写抓取逻辑

```typescript
// src/main.ts
import { Actor, log } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const { startUrls, maxRequests = 100 } = (await Actor.getInput<{
  startUrls: string[];
  maxRequests?: number;
}>())!;

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: maxRequests,

  async requestHandler({ page, request, enqueueLinks }) {
    log.info(`Crawling ${request.url}`);

    const title = await page.title();
    const content = await page.locator("article").textContent();

    await Actor.pushData({
      url: request.url,
      title,
      content: content?.slice(0, 5000),
    });

    await enqueueLinks({ globs: [`${request.url}**`] });
  },
});

await crawler.run(startUrls);
await Actor.exit();
```

### 本地跑 vs 云端跑

```bash
# 本地
apify run

# 推到 Apify 云
apify push

# API 调用
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \
  -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'
```

### 用数据集

```bash
# 拿结果 JSON / CSV / XLSX
curl "https://api.apify.com/v2/acts/<ACTOR_ID>/runs/last/dataset/items?format=json"
```

Apify Actor 自动分页、重试失败页、轮换代理、跨运行保留爬取状态。Apify Store 的 4000+ Actor 都基于同一套 SDK。

---

### FAQ

**Q: Apify 免费吗？**
A: 免费 —— Crawlee（底层库）Apache-2.0 开源。Apify 云有免费档（每月 $5 平台 credit）和生产付费档。在自己基础设施上自托管 Crawlee 完全免费。

**Q: Crawlee 跟 Apify SDK 啥区别？**
A: Crawlee 是独立的抓取库（Apache-2.0）。Apify SDK 在 Crawlee 之上包了云功能（Actor.init / getInput / pushData / 代理配置）。只本地跑的话直接用 Crawlee。

**Q: 我的 Actor 能发到 Apify Store 吗？**
A: 能 —— Actor 可以公开（免费或按用量付费）在 apify.com/store。Apify 抽付费用量的成，处理计费。Store 上很多 Actor 都被 AI agent 通过 API 调用。

---

## 来源与感谢

> Built by [Apify](https://github.com/apify). Licensed under Apache-2.0 (Crawlee).
>
> [apify/apify-sdk-js](https://github.com/apify/apify-sdk-js) — ⭐ Active


---
Source: https://tokrepo.com/en/workflows/apify-actor-sdk-headless-web-automation-at-cloud-scale
Author: Apify