# Firecrawl Extract — Structured Data from Any URL > Firecrawl Extract pulls structured JSON from any URL using a Pydantic/Zod schema. Skip the regex/CSS dance — describe the shape, get clean data. ## Install Copy the content below into your project: ## Quick Use 1. Sign up at firecrawl.dev — get an API key (free 500 credits) 2. `pip install firecrawl-py` (or `npm install @mendable/firecrawl-js`) 3. Use the Pydantic-schema extract snippet below --- ## Intro Firecrawl Extract is the structured-data endpoint on top of Firecrawl's scraper. Pass a URL and a JSON schema; get back validated data. No CSS selectors, no XPath, no regex — Firecrawl runs the page through an LLM with your schema and returns the result. Best for: agents that scrape e-commerce, job boards, news sites, or any structured-but-different-each-site source. Works with: Firecrawl REST API, Firecrawl Python / Node SDK, MCP server. Setup time: 2 minutes (sign up at firecrawl.dev for API key). --- ### One-shot extract ```python from firecrawl import FirecrawlApp from pydantic import BaseModel app = FirecrawlApp(api_key="fc-YOUR-KEY") class Product(BaseModel): name: str price: float in_stock: bool rating: float | None result = app.extract( urls=["https://store.example.com/widgets"], schema=Product.model_json_schema(), prompt="Extract the headline product on this page", ) print(result.data) # {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6} ``` ### Extract across many URLs at once ```python result = app.extract( urls=[ "https://store.example.com/widget-1", "https://store.example.com/widget-2", "https://store.example.com/widget-3", ], schema={ "type": "object", "properties": { "products": { "type": "array", "items": Product.model_json_schema(), } } }, ) ``` ### Use as MCP server Add to your MCP config: ```json { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" } } } } ``` Now Claude Code / Cursor / Codex CLI can call `firecrawl_scrape`, `firecrawl_extract`, `firecrawl_crawl`, `firecrawl_map` directly. ### Cost vs accuracy | Endpoint | Cost | Use | |---|---|---| | `/scrape` | 1 credit | Just markdown, no LLM | | `/extract` | 1-5 credits | Structured data via LLM | | `/crawl` | 1 credit/page | Multi-page site dump | | `/map` | Free | Discover all URLs on a domain first | --- ### FAQ **Q: Is Firecrawl Extract free?** A: Free tier: 500 credits/month for testing. Hobby plan starts at $19/mo for 5K credits. Self-hosted (open-source MIT license) is free but you run your own crawler infrastructure. **Q: How is Extract different from regular Scrape?** A: Scrape returns the raw markdown of a page. Extract runs that through an LLM with your schema and returns validated structured data. Extract is more expensive per call but skips post-processing entirely. **Q: Can I self-host Firecrawl?** A: Yes. The Firecrawl repo is MIT-licensed and runs on Docker. Self-hosting saves money at scale but you manage the Playwright/proxies/queue. Hosted is faster to start. --- ## Source & Thanks > Built by [Firecrawl (Mendable)](https://github.com/firecrawl). Licensed under MIT (self-host) / commercial (hosted). > > [firecrawl/firecrawl](https://github.com/firecrawl/firecrawl) — ⭐ 30,000+ --- ## 快速使用 1. 在 firecrawl.dev 注册,拿 API key(免费 500 credits) 2. `pip install firecrawl-py`(或 `npm install @mendable/firecrawl-js`) 3. 用下面的 Pydantic schema 提取代码 --- ## 简介 Firecrawl Extract 是 Firecrawl 抓取层之上的结构化数据接口。传一个 URL 加一个 JSON schema,拿回校验过的数据。不用 CSS 选择器、不用 XPath、不用 regex —— Firecrawl 把页面过一遍 LLM 用你的 schema 提取,返回结果。适合爬电商、招聘、新闻、或任何结构相似但每个站点都不一样的来源。兼容 Firecrawl REST API / Python SDK / Node SDK / MCP server。装机时间 2 分钟(在 firecrawl.dev 注册拿 key)。 --- ### 一发命中提取 ```python from firecrawl import FirecrawlApp from pydantic import BaseModel app = FirecrawlApp(api_key="fc-YOUR-KEY") class Product(BaseModel): name: str price: float in_stock: bool rating: float | None result = app.extract( urls=["https://store.example.com/widgets"], schema=Product.model_json_schema(), prompt="Extract the headline product on this page", ) print(result.data) # {'name': 'Widget Pro', 'price': 49.99, 'in_stock': True, 'rating': 4.6} ``` ### 一次提取多个 URL ```python result = app.extract( urls=[ "https://store.example.com/widget-1", "https://store.example.com/widget-2", "https://store.example.com/widget-3", ], schema={ "type": "object", "properties": { "products": { "type": "array", "items": Product.model_json_schema(), } } }, ) ``` ### 当 MCP server 用 加到 MCP 配置: ```json { "mcpServers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "fc-YOUR-KEY" } } } } ``` 之后 Claude Code / Cursor / Codex CLI 直接调用 `firecrawl_scrape` / `firecrawl_extract` / `firecrawl_crawl` / `firecrawl_map`。 ### 成本对应准确度 | 端点 | 成本 | 用途 | |---|---|---| | `/scrape` | 1 credit | 纯 markdown,没 LLM | | `/extract` | 1-5 credits | 通过 LLM 拿结构化数据 | | `/crawl` | 1 credit/页 | 多页站点扒 | | `/map` | 免费 | 先发现一个域名上的所有 URL | --- ### FAQ **Q: Firecrawl Extract 免费吗?** A: 免费档每月 500 credits 用于测试。Hobby 套餐 $19/月,5000 credits。自托管(MIT 开源)免费但要自己维护爬虫基础设施。 **Q: Extract 跟普通 Scrape 啥区别?** A: Scrape 返回页面原始 markdown。Extract 用 LLM + 你的 schema 跑一遍,返回校验过的结构化数据。Extract 单次贵但省掉后处理。 **Q: Firecrawl 能自托管吗?** A: 能。Firecrawl 仓库是 MIT 开源,Docker 跑。规模大时自托管省钱,但要自己管 Playwright / 代理 / 队列。托管版上手快。 --- ## 来源与感谢 > Built by [Firecrawl (Mendable)](https://github.com/firecrawl). Licensed under MIT (self-host) / commercial (hosted). > > [firecrawl/firecrawl](https://github.com/firecrawl/firecrawl) — ⭐ 30,000+ --- Source: https://tokrepo.com/en/workflows/firecrawl-extract-structured-data-from-any-url Author: Firecrawl