# Firecrawl Extract — Structured Data from Any URL

> Firecrawl Extract pulls structured JSON from any URL using a Pydantic/Zod schema. Skip the regex/CSS dance — describe the shape, get clean data.

## Install

Copy the content below into your project:

## Quick Use

1. Sign up at firecrawl.dev — get an API key (free 500 credits)
2. `pip install firecrawl-py` (or `npm install @mendable/firecrawl-js`)
3. Use the Pydantic-schema extract snippet below

---

## Intro

Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key).

---

### One-shot extract

```python
from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}
```

### Extract across many URLs at once

```python
result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)
```

### Use as MCP server

Add to your MCP config:

```json
{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}
```

Now Claude Code / Cursor / Codex CLI can call `firecrawl_scrape`, `firecrawl_extract`, `firecrawl_crawl`, `firecrawl_map` directly.

### Cost vs accuracy

| Endpoint | Cost | Use |
|---|---|---|
| `/scrape` | 1 credit | Just markdown, no LLM |
| `/extract` | 1-5 credits | Structured data via LLM |
| `/crawl` | 1 credit/page | Multi-page site dump |
| `/map` | Free | Discover all URLs on a domain first |

---

### FAQ

**Q: Is Firecrawl Extract free?**
A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure.

**Q: How is Extract different from regular Scrape?**
A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely.

**Q: Can I self-host Firecrawl?**
A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start.

---

## Source & Thanks

> Built by [Firecrawl (Mendable)](https://github.com/firecrawl). Licensed under MIT (self-host) / commercial (hosted).
>
> [firecrawl/firecrawl](https://github.com/firecrawl/firecrawl) — ⭐ 30,000+

---

<!-- ZH -->

## 快速使用

1. 在 firecrawl.dev 注册，拿 API key（免费 500 credits）
2. `pip install firecrawl-py`（或 `npm install @mendable/firecrawl-js`）
3. 用下面的 Pydantic schema 提取代码

---

## 简介

Firecrawl Extract 是 Firecrawl 抓取层之上的结构化数据接口。传一个 URL 加一个 JSON schema，拿回校验过的数据。不用 CSS 选择器、不用 XPath、不用 regex —— Firecrawl 把页面过一遍 LLM 用你的 schema 提取，返回结果。适合爬电商、招聘、新闻、或任何结构相似但每个站点都不一样的来源。兼容 Firecrawl REST API / Python SDK / Node SDK / MCP server。装机时间 2 分钟（在 firecrawl.dev 注册拿 key）。

---

### 一发命中提取

```python
from firecrawl import FirecrawlApp
from pydantic import BaseModel

app = FirecrawlApp(api_key="fc-YOUR-KEY")

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    rating: float | None

result = app.extract(
    urls=["https://store.example.com/widgets"],
    schema=Product.model_json_schema(),
    prompt="Extract the headline product on this page",
)

print(result.data)
# {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6}
```

### 一次提取多个 URL

```python
result = app.extract(
    urls=[
        "https://store.example.com/widget-1",
        "https://store.example.com/widget-2",
        "https://store.example.com/widget-3",
    ],
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": Product.model_json_schema(),
            }
        }
    },
)
```

### 当 MCP server 用

加到 MCP 配置：

```json
{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" }
    }
  }
}
```

之后 Claude Code / Cursor / Codex CLI 直接调用 `firecrawl_scrape` / `firecrawl_extract` / `firecrawl_crawl` / `firecrawl_map`。

### 成本对应准确度

| 端点 | 成本 | 用途 |
|---|---|---|
| `/scrape` | 1 credit | 纯 markdown，没 LLM |
| `/extract` | 1-5 credits | 通过 LLM 拿结构化数据 |
| `/crawl` | 1 credit/页 | 多页站点扒 |
| `/map` | 免费 | 先发现一个域名上的所有 URL |

---

### FAQ

**Q: Firecrawl Extract 免费吗？**
A: 免费档每月 500 credits 用于测试。Hobby 套餐 $19/月，5000 credits。自托管（MIT 开源）免费但要自己维护爬虫基础设施。

**Q: Extract 跟普通 Scrape 啥区别？**
A: Scrape 返回页面原始 markdown。Extract 用 LLM + 你的 schema 跑一遍，返回校验过的结构化数据。Extract 单次贵但省掉后处理。

**Q: Firecrawl 能自托管吗？**
A: 能。Firecrawl 仓库是 MIT 开源，Docker 跑。规模大时自托管省钱，但要自己管 Playwright / 代理 / 队列。托管版上手快。

---

## 来源与感谢

> Built by [Firecrawl (Mendable)](https://github.com/firecrawl). Licensed under MIT (self-host) / commercial (hosted).
>
> [firecrawl/firecrawl](https://github.com/firecrawl/firecrawl) — ⭐ 30,000+


---
Source: https://tokrepo.com/en/workflows/firecrawl-extract-structured-data-from-any-url
Author: Firecrawl