ScriptsApr 7, 2026·2 min read

Firecrawl — Web Scraping API for AI Applications

Turn any website into clean markdown or structured data for LLMs. Firecrawl handles JavaScript rendering, anti-bot bypassing, sitemaps, and batch crawling via simple API.

PR
Prompt Lab · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install firecrawl-py
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="fc-...")

# Scrape a single page
result = app.scrape_url("https://docs.anthropic.com", params={"formats": ["markdown"]})
print(result["markdown"])

# Crawl entire site
crawl = app.crawl_url("https://docs.anthropic.com", params={"limit": 100})
for page in crawl["data"]:
    print(page["markdown"][:200])

What is Firecrawl?

Firecrawl is a web scraping API designed for AI applications. It converts any website into clean markdown or structured data that LLMs can consume. It handles JavaScript rendering, anti-bot detection, rate limiting, and sitemap discovery — so you can focus on building your AI pipeline.

Answer-Ready: Firecrawl is a web scraping API that converts websites into clean markdown or structured data for LLMs. Handles JavaScript rendering, anti-bot bypassing, and batch crawling. Used by major AI companies for RAG and training data. 30k+ GitHub stars.

Best for: AI teams building RAG pipelines or data extraction workflows. Works with: Any LLM framework, LangChain, LlamaIndex, Claude Code. Setup time: Under 2 minutes.

Core Features

1. Single Page Scrape

result = app.scrape_url("https://example.com", params={
    "formats": ["markdown", "html", "links"],
    "onlyMainContent": True,  # Strip nav, footer, ads
})

2. Full Site Crawl

crawl = app.crawl_url("https://docs.example.com", params={
    "limit": 500,           # Max pages
    "maxDepth": 3,          # Link depth
    "includePaths": ["/docs/*"],
    "excludePaths": ["/blog/*"],
})

3. Structured Extraction

result = app.scrape_url("https://example.com/pricing", params={
    "formats": ["extract"],
    "extract": {
        "schema": {
            "type": "object",
            "properties": {
                "plans": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "price": {"type": "string"},
                            "features": {"type": "array", "items": {"type": "string"}}
                        }
                    }
                }
            }
        }
    }
})

4. Map (Discover URLs)

links = app.map_url("https://example.com")
print(f"Found {len(links)} pages")

5. Self-Hosting

git clone https://github.com/mendableai/firecrawl
docker compose up -d
# API at http://localhost:3002

Use Cases

Use Case How
RAG Pipeline Crawl docs → markdown → embed → vector DB
Competitive Intel Scrape competitor pricing pages
Training Data Extract clean text from web sources
Monitoring Track website changes over time

Pricing

Tier Pages/mo Price
Free 500 $0
Hobby 3,000 $16/mo
Standard 100,000 $83/mo
Self-hosted Unlimited Free

FAQ

Q: How does it handle JavaScript-heavy sites? A: Firecrawl uses headless browsers to render JavaScript before extraction.

Q: Can I self-host? A: Yes, fully open-source. Docker Compose deployment available.

Q: How does it compare to Jina Reader? A: Firecrawl offers full site crawling, structured extraction, and sitemap discovery. Jina Reader is simpler (URL prefix for single pages).

🙏

Source & Thanks

Created by Mendable. Licensed under AGPL-3.0.

mendableai/firecrawl — 30k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets