How do I install Apify Actor SDK — Headless Web Automation at Cloud Scale?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Apify Actor SDK — Headless Web Automation at Cloud Scale

// src/main.ts import { Actor, log } from "apify"; import { PlaywrightCrawler } from "crawlee"; await Actor.init(); const { startUrls, maxRequests = 100 } = (await Actor.getInput<{ startUrls: string[]; maxRequests?: number; }>())!; const crawler = new PlaywrightCrawler({ maxRequestsPerCrawl: maxRequests, async requestHandler({ page, request, enqueueLinks }) { log.info(`Crawling ${request.url}`); const title = await page.title(); const content = await page.locator("article").textContent(); await Actor.pushData({ url: request.url, title, content: content?.slice(0, 5000), }); await enqueueLinks({ globs: [`${request.url}**`] }); }, }); await crawler.run(startUrls); await Actor.exit();

# Locally apify run # Push to Apify cloud apify push # Run via API curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \ -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'

Quick Use

npx apify-cli create my-scraper (pick a template)
Edit src/main.ts to define your crawl logic
apify run locally; apify push to deploy to Apify Cloud

Intro

The Apify SDK packages a Crawlee or Playwright script into an Apify Actor — a containerized program that runs on Apify's cloud with built-in proxy rotation, retries, dataset persistence, request queue, and scheduling. You write the scraping logic; the SDK handles the infra. Best for: production scrapers / browser-automation agents that need reliability + observability without rolling your own queue. Works with: Node 20+, Python 3.10+. Setup time: 5 minutes (npx apify-cli create).

Scaffold an Actor

npx apify-cli create my-scraper
cd my-scraper
# Pick the "Crawlee + PlaywrightCrawler" template

Write the scraping logic

// src/main.ts
import { Actor, log } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const { startUrls, maxRequests = 100 } = (await Actor.getInput<{
  startUrls: string[];
  maxRequests?: number;
}>())!;

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: maxRequests,

  async requestHandler({ page, request, enqueueLinks }) {
    log.info(`Crawling ${request.url}`);

    const title = await page.title();
    const content = await page.locator("article").textContent();

    await Actor.pushData({
      url: request.url,
      title,
      content: content?.slice(0, 5000),
    });

    await enqueueLinks({ globs: [`${request.url}**`] });
  },
});

await crawler.run(startUrls);
await Actor.exit();

Run locally vs in cloud

# Locally
apify run

# Push to Apify cloud
apify push

# Run via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \
  -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'

Use the dataset

# Fetch results as JSON / CSV / XLSX
curl "https://api.apify.com/v2/acts/<ACTOR_ID>/runs/last/dataset/items?format=json"

Apify Actors auto-paginate, retry failed pages, rotate proxies, and persist crawl state across runs. The 4,000+ Apify Store Actors are built on this same SDK.

FAQ

Q: Is Apify free? A: Yes — Crawlee (the underlying library) is Apache-2.0 open-source. The Apify cloud has a free tier ($5/mo platform credit) and paid plans for production. Self-hosting Crawlee on your own infra is fully free.

Q: Crawlee vs Apify SDK? A: Crawlee is the standalone scraping library (Apache-2.0). The Apify SDK wraps Crawlee with cloud features (Actor.init, getInput, pushData, proxy config). For local-only scrapers, just use Crawlee.

Q: Can I publish my Actor to the Apify Store? A: Yes — actors can be public (free or paid usage-based) on apify.com/store. Apify takes a cut of paid usage and handles billing. Many Apify Store Actors are run by AI agents via the API.

Source & Thanks

Built by Apify. Licensed under Apache-2.0 (Crawlee).

apify/apify-sdk-js — ⭐ Active

Apify Actor SDK — Headless Web Automation at Cloud Scale

Staging sûr pour cet actif

Scaffold an Actor

Write the scraping logic

Run locally vs in cloud

Use the dataset

FAQ

Quick Use

Intro

Scaffold an Actor

Write the scraping logic

Run locally vs in cloud

Use the dataset

FAQ

Source & Thanks

Source et remerciements

Fil de discussion

Actifs similaires

Crawlee — Web Scraping and Browser Automation Library

Crawlee — Production Web Scraping for Node.js

Boto3 — The Official AWS SDK for Python

Dynamo — Datacenter-Scale Distributed Inference Serving Framework