Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsApr 7, 2026·2 min de lecture

Firecrawl — Web Scraping API for AI Applications

Turn any website into clean markdown or structured data for LLMs. Firecrawl handles JavaScript rendering, anti-bot bypassing, sitemaps, and batch crawling via simple API.

What is Firecrawl?

Firecrawl is a web scraping API designed for AI applications. It converts any website into clean markdown or structured data that LLMs can consume. It handles JavaScript rendering, anti-bot detection, rate limiting, and sitemap discovery — so you can focus on building your AI pipeline.

Answer-Ready: Firecrawl is a web scraping API that converts websites into clean markdown or structured data for LLMs. Handles JavaScript rendering, anti-bot bypassing, and batch crawling. Used by major AI companies for RAG and training data. 30k+ GitHub stars.

Best for: AI teams building RAG pipelines or data extraction workflows. Works with: Any LLM framework, LangChain, LlamaIndex, Claude Code. Setup time: Under 2 minutes.

Core Features

1. Single Page Scrape

result = app.scrape_url("https://example.com", params={
    "formats": ["markdown", "html", "links"],
    "onlyMainContent": True,  # Strip nav, footer, ads
})

2. Full Site Crawl

crawl = app.crawl_url("https://docs.example.com", params={
    "limit": 500,           # Max pages
    "maxDepth": 3,          # Link depth
    "includePaths": ["/docs/*"],
    "excludePaths": ["/blog/*"],
})

3. Structured Extraction

result = app.scrape_url("https://example.com/pricing", params={
    "formats": ["extract"],
    "extract": {
        "schema": {
            "type": "object",
            "properties": {
                "plans": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "price": {"type": "string"},
                            "features": {"type": "array", "items": {"type": "string"}}
                        }
                    }
                }
            }
        }
    }
})

4. Map (Discover URLs)

links = app.map_url("https://example.com")
print(f"Found {len(links)} pages")

5. Self-Hosting

git clone https://github.com/mendableai/firecrawl
docker compose up -d
# API at http://localhost:3002

Use Cases

Use Case How
RAG Pipeline Crawl docs → markdown → embed → vector DB
Competitive Intel Scrape competitor pricing pages
Training Data Extract clean text from web sources
Monitoring Track website changes over time

Pricing

Tier Pages/mo Price
Free 500 $0
Hobby 3,000 $16/mo
Standard 100,000 $83/mo
Self-hosted Unlimited Free

FAQ

Q: How does it handle JavaScript-heavy sites? A: Firecrawl uses headless browsers to render JavaScript before extraction.

Q: Can I self-host? A: Yes, fully open-source. Docker Compose deployment available.

Q: How does it compare to Jina Reader? A: Firecrawl offers full site crawling, structured extraction, and sitemap discovery. Jina Reader is simpler (URL prefix for single pages).

🙏

Source et remerciements

Created by Mendable. Licensed under AGPL-3.0.

mendableai/firecrawl — 30k+ stars

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires