# Firecrawl — Web Scraping API for AI Applications > Turn any website into clean markdown or structured data for LLMs. Firecrawl handles JavaScript rendering, anti-bot bypassing, sitemaps, and batch crawling via simple API. ## Install Save as a script file and run: ## Quick Use ```bash pip install firecrawl-py ``` ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="fc-...") # Scrape a single page result = app.scrape_url("https://docs.anthropic.com", params={"formats": ["markdown"]}) print(result["markdown"]) # Crawl entire site crawl = app.crawl_url("https://docs.anthropic.com", params={"limit": 100}) for page in crawl["data"]: print(page["markdown"][:200]) ``` ## What is Firecrawl? Firecrawl is a web scraping API designed for AI applications. It converts any website into clean markdown or structured data that LLMs can consume. It handles JavaScript rendering, anti-bot detection, rate limiting, and sitemap discovery — so you can focus on building your AI pipeline. **Answer-Ready**: Firecrawl is a web scraping API that converts websites into clean markdown or structured data for LLMs. Handles JavaScript rendering, anti-bot bypassing, and batch crawling. Used by major AI companies for RAG and training data. 30k+ GitHub stars. **Best for**: AI teams building RAG pipelines or data extraction workflows. **Works with**: Any LLM framework, LangChain, LlamaIndex, Claude Code. **Setup time**: Under 2 minutes. ## Core Features ### 1. Single Page Scrape ```python result = app.scrape_url("https://example.com", params={ "formats": ["markdown", "html", "links"], "onlyMainContent": True, # Strip nav, footer, ads }) ``` ### 2. Full Site Crawl ```python crawl = app.crawl_url("https://docs.example.com", params={ "limit": 500, # Max pages "maxDepth": 3, # Link depth "includePaths": ["/docs/*"], "excludePaths": ["/blog/*"], }) ``` ### 3. Structured Extraction ```python result = app.scrape_url("https://example.com/pricing", params={ "formats": ["extract"], "extract": { "schema": { "type": "object", "properties": { "plans": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "price": {"type": "string"}, "features": {"type": "array", "items": {"type": "string"}} } } } } } } }) ``` ### 4. Map (Discover URLs) ```python links = app.map_url("https://example.com") print(f"Found {len(links)} pages") ``` ### 5. Self-Hosting ```bash git clone https://github.com/mendableai/firecrawl docker compose up -d # API at http://localhost:3002 ``` ## Use Cases | Use Case | How | |----------|-----| | RAG Pipeline | Crawl docs → markdown → embed → vector DB | | Competitive Intel | Scrape competitor pricing pages | | Training Data | Extract clean text from web sources | | Monitoring | Track website changes over time | ## Pricing | Tier | Pages/mo | Price | |------|----------|-------| | Free | 500 | $0 | | Hobby | 3,000 | $16/mo | | Standard | 100,000 | $83/mo | | Self-hosted | Unlimited | Free | ## FAQ **Q: How does it handle JavaScript-heavy sites?** A: Firecrawl uses headless browsers to render JavaScript before extraction. **Q: Can I self-host?** A: Yes, fully open-source. Docker Compose deployment available. **Q: How does it compare to Jina Reader?** A: Firecrawl offers full site crawling, structured extraction, and sitemap discovery. Jina Reader is simpler (URL prefix for single pages). ## Source & Thanks > Created by [Mendable](https://github.com/mendableai). Licensed under AGPL-3.0. > > [mendableai/firecrawl](https://github.com/mendableai/firecrawl) — 30k+ stars ## 快速使用 ```bash pip install firecrawl-py ``` 三行代码将任何网站转为 AI 友好的 Markdown。 ## 什么是 Firecrawl? Firecrawl 是面向 AI 应用的网页抓取 API,将网站转为干净的 Markdown 或结构化数据。处理 JS 渲染、反检测和批量爬取。 **一句话总结**:网页抓取 API,将网站转为 LLM 可消费的 Markdown,支持 JS 渲染、结构化提取和全站爬取,30k+ GitHub stars。 **适合人群**:构建 RAG 管线或数据提取工作流的 AI 团队。 ## 核心功能 ### 1. 单页抓取 一行代码获取干净 Markdown。 ### 2. 全站爬取 自动发现链接,按深度和路径过滤。 ### 3. 结构化提取 用 JSON Schema 定义输出格式。 ### 4. 可自托管 Docker Compose 部署,无限制。 ## 常见问题 **Q: 支持 JS 渲染?** A: 支持,使用无头浏览器渲染后提取。 **Q: 和 Jina Reader 比较?** A: Firecrawl 提供全站爬取和结构化提取,Jina Reader 更简单(单页 URL 前缀)。 ## 来源与致谢 > [mendableai/firecrawl](https://github.com/mendableai/firecrawl) — 30k+ stars, AGPL-3.0 --- Source: https://tokrepo.com/en/workflows/6a62a986-9f1a-4a59-88c8-b99151986854 Author: Prompt Lab