Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsMay 7, 2026·4 min de lecture

Firecrawl Extract — Structured Data from Any URL

Firecrawl Extract pulls structured JSON from any URL using a Pydantic/Zod schema. Skip the regex/CSS dance — describe the shape, get clean data.

Firecrawl
Firecrawl · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 5/100Stage only
Surface agent
Tout agent MCP/CLI
Type
Mcp Config
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install 86ee8206-4917-4eee-95e2-50f8ab8c9e39
Introduction

Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key).


One-shot extract

from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}

Extract across many URLs at once

result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)

Use as MCP server

Add to your MCP config:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}

Now Claude Code / Cursor / Codex CLI can call firecrawl_scrape, firecrawl_extract, firecrawl_crawl, firecrawl_map directly.

Cost vs accuracy

Endpoint Cost Use
/scrape 1 credit Just markdown, no LLM
/extract 1-5 credits Structured data via LLM
/crawl 1 credit/page Multi-page site dump
/map Free Discover all URLs on a domain first

FAQ

Q: Is Firecrawl Extract free? A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure.

Q: How is Extract different from regular Scrape? A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely.

Q: Can I self-host Firecrawl? A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start.


Quick Use

  1. Sign up at firecrawl.dev — get an API key (free 500 credits)
  2. pip install firecrawl-py (or npm install @mendable/firecrawl-js)
  3. Use the Pydantic-schema extract snippet below

Intro

Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key).


One-shot extract

from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}

Extract across many URLs at once

result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)

Use as MCP server

Add to your MCP config:

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}

Now Claude Code / Cursor / Codex CLI can call firecrawl_scrape, firecrawl_extract, firecrawl_crawl, firecrawl_map directly.

Cost vs accuracy

Endpoint Cost Use
/scrape 1 credit Just markdown, no LLM
/extract 1-5 credits Structured data via LLM
/crawl 1 credit/page Multi-page site dump
/map Free Discover all URLs on a domain first

FAQ

Q: Is Firecrawl Extract free? A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure.

Q: How is Extract different from regular Scrape? A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely.

Q: Can I self-host Firecrawl? A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start.


Source & Thanks

Built by Firecrawl (Mendable). Licensed under MIT (self-host) / commercial (hosted).

firecrawl/firecrawl — ⭐ 30,000+

🙏

Source et remerciements

Built by Firecrawl (Mendable). Licensed under MIT (self-host) / commercial (hosted).

firecrawl/firecrawl — ⭐ 30,000+

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires