Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsMay 7, 2026·4 min de lectura

Firecrawl Extract — Structured Data from Any URL

Firecrawl Extract pulls structured JSON from any URL using a Pydantic/Zod schema. Skip the regex/CSS dance — describe the shape, get clean data.

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 17/100Política: staging
Superficie agent
Cualquier agent MCP/CLI
Tipo
Mcp Config
Instalación
Stage only
Confianza
Confianza: Community
Entrada
Asset
Comando de staging seguro
npx -y tokrepo@latest install 86ee8206-4917-4eee-95e2-50f8ab8c9e39 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key).


One-shot extract

from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}

Extract across many URLs at once

result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)

Use as MCP server

Add to your MCP config:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}

Now Claude Code / Cursor / Codex CLI can call firecrawl_scrape, firecrawl_extract, firecrawl_crawl, firecrawl_map directly.

Cost vs accuracy

Endpoint Cost Use
/scrape 1 credit Just markdown, no LLM
/extract 1-5 credits Structured data via LLM
/crawl 1 credit/page Multi-page site dump
/map Free Discover all URLs on a domain first

FAQ

Q: Is Firecrawl Extract free? A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure.

Q: How is Extract different from regular Scrape? A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely.

Q: Can I self-host Firecrawl? A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start.


Quick Use

  1. Sign up at firecrawl.dev — get an API key (free 500 credits)
  2. pip install firecrawl-py (or npm install @mendable/firecrawl-js)
  3. Use the Pydantic-schema extract snippet below

Intro

Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key).


One-shot extract

from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}

Extract across many URLs at once

result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)

Use as MCP server

Add to your MCP config:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}

Now Claude Code / Cursor / Codex CLI can call firecrawl_scrape, firecrawl_extract, firecrawl_crawl, firecrawl_map directly.

Cost vs accuracy

Endpoint Cost Use
/scrape 1 credit Just markdown, no LLM
/extract 1-5 credits Structured data via LLM
/crawl 1 credit/page Multi-page site dump
/map Free Discover all URLs on a domain first

FAQ

Q: Is Firecrawl Extract free? A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure.

Q: How is Extract different from regular Scrape? A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely.

Q: Can I self-host Firecrawl? A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start.


Source & Thanks

Built by Firecrawl (Mendable). Licensed under MIT (self-host) / commercial (hosted).

firecrawl/firecrawl — ⭐ 30,000+

🙏

Fuente y agradecimientos

Built by Firecrawl (Mendable). Licensed under MIT (self-host) / commercial (hosted).

firecrawl/firecrawl — ⭐ 30,000+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados