ScriptsMay 7, 2026·3 min read

Apify Actor SDK — Headless Web Automation at Cloud Scale

The Apify SDK turns a Crawlee/Playwright script into a managed cloud Actor. Auto-retries, proxy rotation, dataset storage, request queue out of the box.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install 166ace25-f259-4b36-bf32-0ccf499d441b
Intro

The Apify SDK packages a Crawlee or Playwright script into an Apify Actor — a containerized program that runs on Apify's cloud with built-in proxy rotation, retries, dataset persistence, request queue, and scheduling. You write the scraping logic; the SDK handles the infra. Best for: production scrapers / browser-automation agents that need reliability + observability without rolling your own queue. Works with: Node 20+, Python 3.10+. Setup time: 5 minutes (npx apify-cli create).


Scaffold an Actor

npx apify-cli create my-scraper
cd my-scraper
# Pick the "Crawlee + PlaywrightCrawler" template

Write the scraping logic

// src/main.ts
import { Actor, log } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const { startUrls, maxRequests = 100 } = (await Actor.getInput<{
  startUrls: string[];
  maxRequests?: number;
}>())!;

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: maxRequests,

  async requestHandler({ page, request, enqueueLinks }) {
    log.info(`Crawling ${request.url}`);

    const title = await page.title();
    const content = await page.locator("article").textContent();

    await Actor.pushData({
      url: request.url,
      title,
      content: content?.slice(0, 5000),
    });

    await enqueueLinks({ globs: [`${request.url}**`] });
  },
});

await crawler.run(startUrls);
await Actor.exit();

Run locally vs in cloud

# Locally
apify run

# Push to Apify cloud
apify push

# Run via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \
  -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'

Use the dataset

# Fetch results as JSON / CSV / XLSX
curl "https://api.apify.com/v2/acts/<ACTOR_ID>/runs/last/dataset/items?format=json"

Apify Actors auto-paginate, retry failed pages, rotate proxies, and persist crawl state across runs. The 4,000+ Apify Store Actors are built on this same SDK.


FAQ

Q: Is Apify free? A: Yes — Crawlee (the underlying library) is Apache-2.0 open-source. The Apify cloud has a free tier ($5/mo platform credit) and paid plans for production. Self-hosting Crawlee on your own infra is fully free.

Q: Crawlee vs Apify SDK? A: Crawlee is the standalone scraping library (Apache-2.0). The Apify SDK wraps Crawlee with cloud features (Actor.init, getInput, pushData, proxy config). For local-only scrapers, just use Crawlee.

Q: Can I publish my Actor to the Apify Store? A: Yes — actors can be public (free or paid usage-based) on apify.com/store. Apify takes a cut of paid usage and handles billing. Many Apify Store Actors are run by AI agents via the API.


Quick Use

  1. npx apify-cli create my-scraper (pick a template)
  2. Edit src/main.ts to define your crawl logic
  3. apify run locally; apify push to deploy to Apify Cloud

Intro

The Apify SDK packages a Crawlee or Playwright script into an Apify Actor — a containerized program that runs on Apify's cloud with built-in proxy rotation, retries, dataset persistence, request queue, and scheduling. You write the scraping logic; the SDK handles the infra. Best for: production scrapers / browser-automation agents that need reliability + observability without rolling your own queue. Works with: Node 20+, Python 3.10+. Setup time: 5 minutes (npx apify-cli create).


Scaffold an Actor

npx apify-cli create my-scraper
cd my-scraper
# Pick the "Crawlee + PlaywrightCrawler" template

Write the scraping logic

// src/main.ts
import { Actor, log } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const { startUrls, maxRequests = 100 } = (await Actor.getInput<{
  startUrls: string[];
  maxRequests?: number;
}>())!;

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: maxRequests,

  async requestHandler({ page, request, enqueueLinks }) {
    log.info(`Crawling ${request.url}`);

    const title = await page.title();
    const content = await page.locator("article").textContent();

    await Actor.pushData({
      url: request.url,
      title,
      content: content?.slice(0, 5000),
    });

    await enqueueLinks({ globs: [`${request.url}**`] });
  },
});

await crawler.run(startUrls);
await Actor.exit();

Run locally vs in cloud

# Locally
apify run

# Push to Apify cloud
apify push

# Run via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \
  -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'

Use the dataset

# Fetch results as JSON / CSV / XLSX
curl "https://api.apify.com/v2/acts/<ACTOR_ID>/runs/last/dataset/items?format=json"

Apify Actors auto-paginate, retry failed pages, rotate proxies, and persist crawl state across runs. The 4,000+ Apify Store Actors are built on this same SDK.


FAQ

Q: Is Apify free? A: Yes — Crawlee (the underlying library) is Apache-2.0 open-source. The Apify cloud has a free tier ($5/mo platform credit) and paid plans for production. Self-hosting Crawlee on your own infra is fully free.

Q: Crawlee vs Apify SDK? A: Crawlee is the standalone scraping library (Apache-2.0). The Apify SDK wraps Crawlee with cloud features (Actor.init, getInput, pushData, proxy config). For local-only scrapers, just use Crawlee.

Q: Can I publish my Actor to the Apify Store? A: Yes — actors can be public (free or paid usage-based) on apify.com/store. Apify takes a cut of paid usage and handles billing. Many Apify Store Actors are run by AI agents via the API.


Source & Thanks

Built by Apify. Licensed under Apache-2.0 (Crawlee).

apify/apify-sdk-js — ⭐ Active

🙏

Source & Thanks

Built by Apify. Licensed under Apache-2.0 (Crawlee).

apify/apify-sdk-js — ⭐ Active

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets