Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 7, 2026·3 min de lectura

Apify Actor SDK — Headless Web Automation at Cloud Scale

The Apify SDK turns a Crawlee/Playwright script into a managed cloud Actor. Auto-retries, proxy rotation, dataset storage, request queue out of the box.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install 166ace25-f259-4b36-bf32-0ccf499d441b
Introducción

The Apify SDK packages a Crawlee or Playwright script into an Apify Actor — a containerized program that runs on Apify's cloud with built-in proxy rotation, retries, dataset persistence, request queue, and scheduling. You write the scraping logic; the SDK handles the infra. Best for: production scrapers / browser-automation agents that need reliability + observability without rolling your own queue. Works with: Node 20+, Python 3.10+. Setup time: 5 minutes (npx apify-cli create).


Scaffold an Actor

npx apify-cli create my-scraper
cd my-scraper
# Pick the "Crawlee + PlaywrightCrawler" template

Write the scraping logic

// src/main.ts
import { Actor, log } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const { startUrls, maxRequests = 100 } = (await Actor.getInput<{
  startUrls: string[];
  maxRequests?: number;
}>())!;

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: maxRequests,

  async requestHandler({ page, request, enqueueLinks }) {
    log.info(`Crawling ${request.url}`);

    const title = await page.title();
    const content = await page.locator("article").textContent();

    await Actor.pushData({
      url: request.url,
      title,
      content: content?.slice(0, 5000),
    });

    await enqueueLinks({ globs: [`${request.url}**`] });
  },
});

await crawler.run(startUrls);
await Actor.exit();

Run locally vs in cloud

# Locally
apify run

# Push to Apify cloud
apify push

# Run via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \
  -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'

Use the dataset

# Fetch results as JSON / CSV / XLSX
curl "https://api.apify.com/v2/acts/<ACTOR_ID>/runs/last/dataset/items?format=json"

Apify Actors auto-paginate, retry failed pages, rotate proxies, and persist crawl state across runs. The 4,000+ Apify Store Actors are built on this same SDK.


FAQ

Q: Is Apify free? A: Yes — Crawlee (the underlying library) is Apache-2.0 open-source. The Apify cloud has a free tier ($5/mo platform credit) and paid plans for production. Self-hosting Crawlee on your own infra is fully free.

Q: Crawlee vs Apify SDK? A: Crawlee is the standalone scraping library (Apache-2.0). The Apify SDK wraps Crawlee with cloud features (Actor.init, getInput, pushData, proxy config). For local-only scrapers, just use Crawlee.

Q: Can I publish my Actor to the Apify Store? A: Yes — actors can be public (free or paid usage-based) on apify.com/store. Apify takes a cut of paid usage and handles billing. Many Apify Store Actors are run by AI agents via the API.


Quick Use

  1. npx apify-cli create my-scraper (pick a template)
  2. Edit src/main.ts to define your crawl logic
  3. apify run locally; apify push to deploy to Apify Cloud

Intro

The Apify SDK packages a Crawlee or Playwright script into an Apify Actor — a containerized program that runs on Apify's cloud with built-in proxy rotation, retries, dataset persistence, request queue, and scheduling. You write the scraping logic; the SDK handles the infra. Best for: production scrapers / browser-automation agents that need reliability + observability without rolling your own queue. Works with: Node 20+, Python 3.10+. Setup time: 5 minutes (npx apify-cli create).


Scaffold an Actor

npx apify-cli create my-scraper
cd my-scraper
# Pick the "Crawlee + PlaywrightCrawler" template

Write the scraping logic

// src/main.ts
import { Actor, log } from "apify";
import { PlaywrightCrawler } from "crawlee";

await Actor.init();

const { startUrls, maxRequests = 100 } = (await Actor.getInput<{
  startUrls: string[];
  maxRequests?: number;
}>())!;

const crawler = new PlaywrightCrawler({
  maxRequestsPerCrawl: maxRequests,

  async requestHandler({ page, request, enqueueLinks }) {
    log.info(`Crawling ${request.url}`);

    const title = await page.title();
    const content = await page.locator("article").textContent();

    await Actor.pushData({
      url: request.url,
      title,
      content: content?.slice(0, 5000),
    });

    await enqueueLinks({ globs: [`${request.url}**`] });
  },
});

await crawler.run(startUrls);
await Actor.exit();

Run locally vs in cloud

# Locally
apify run

# Push to Apify cloud
apify push

# Run via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs?token=$APIFY_TOKEN" \
  -d '{ "startUrls": ["https://example.com/blog"], "maxRequests": 50 }'

Use the dataset

# Fetch results as JSON / CSV / XLSX
curl "https://api.apify.com/v2/acts/<ACTOR_ID>/runs/last/dataset/items?format=json"

Apify Actors auto-paginate, retry failed pages, rotate proxies, and persist crawl state across runs. The 4,000+ Apify Store Actors are built on this same SDK.


FAQ

Q: Is Apify free? A: Yes — Crawlee (the underlying library) is Apache-2.0 open-source. The Apify cloud has a free tier ($5/mo platform credit) and paid plans for production. Self-hosting Crawlee on your own infra is fully free.

Q: Crawlee vs Apify SDK? A: Crawlee is the standalone scraping library (Apache-2.0). The Apify SDK wraps Crawlee with cloud features (Actor.init, getInput, pushData, proxy config). For local-only scrapers, just use Crawlee.

Q: Can I publish my Actor to the Apify Store? A: Yes — actors can be public (free or paid usage-based) on apify.com/store. Apify takes a cut of paid usage and handles billing. Many Apify Store Actors are run by AI agents via the API.


Source & Thanks

Built by Apify. Licensed under Apache-2.0 (Crawlee).

apify/apify-sdk-js — ⭐ Active

🙏

Fuente y agradecimientos

Built by Apify. Licensed under Apache-2.0 (Crawlee).

apify/apify-sdk-js — ⭐ Active

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados