Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsMay 7, 2026·4 min de lecture

Firecrawl Extract — Structured Data from Any URL

Firecrawl Extract pulls structured JSON from any URL using a Pydantic/Zod schema. Skip the regex/CSS dance — describe the shape, get clean data.

Firecrawl
Firecrawl · Community
Prêt pour agents

Staging sûr pour cet actif

Cet actif est d'abord staged. Le prompt copié demande à l'agent d'inspecter les fichiers staged avant d'activer scripts, config MCP ou config globale.

Stage only · 17/100Policy : staging
Surface agent
Tout agent MCP/CLI
Type
Mcp Config
Installation
Stage only
Confiance
Confiance : Community
Point d'entrée
Asset
Commande de staging sûr
npx -y tokrepo@latest install 86ee8206-4917-4eee-95e2-50f8ab8c9e39 --target codex

Stage les fichiers d'abord; l'activation exige la revue du README et du plan staged.

Introduction

Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key).


One-shot extract

from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}

Extract across many URLs at once

result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)

Use as MCP server

Add to your MCP config:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}

Now Claude Code / Cursor / Codex CLI can call firecrawl_scrape, firecrawl_extract, firecrawl_crawl, firecrawl_map directly.

Cost vs accuracy

Endpoint Cost Use
/scrape 1 credit Just markdown, no LLM
/extract 1-5 credits Structured data via LLM
/crawl 1 credit/page Multi-page site dump
/map Free Discover all URLs on a domain first

FAQ

Q: Is Firecrawl Extract free? A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure.

Q: How is Extract different from regular Scrape? A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely.

Q: Can I self-host Firecrawl? A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start.


Quick Use

  1. Sign up at firecrawl.dev — get an API key (free 500 credits)
  2. pip install firecrawl-py (or npm install @mendable/firecrawl-js)
  3. Use the Pydantic-schema extract snippet below

Intro

Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key).


One-shot extract

from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}

Extract across many URLs at once

result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)

Use as MCP server

Add to your MCP config:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}

Now Claude Code / Cursor / Codex CLI can call firecrawl_scrape, firecrawl_extract, firecrawl_crawl, firecrawl_map directly.

Cost vs accuracy

Endpoint Cost Use
/scrape 1 credit Just markdown, no LLM
/extract 1-5 credits Structured data via LLM
/crawl 1 credit/page Multi-page site dump
/map Free Discover all URLs on a domain first

FAQ

Q: Is Firecrawl Extract free? A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure.

Q: How is Extract different from regular Scrape? A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely.

Q: Can I self-host Firecrawl? A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start.


Source & Thanks

Built by Firecrawl (Mendable). Licensed under MIT (self-host) / commercial (hosted).

firecrawl/firecrawl — ⭐ 30,000+

🙏

Source et remerciements

Built by Firecrawl (Mendable). Licensed under MIT (self-host) / commercial (hosted).

firecrawl/firecrawl — ⭐ 30,000+

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires