What is Firecrawl?
Firecrawl is a web scraping API designed for AI applications. It converts any website into clean markdown or structured data that LLMs can consume. It handles JavaScript rendering, anti-bot detection, rate limiting, and sitemap discovery — so you can focus on building your AI pipeline.
Answer-Ready: Firecrawl is a web scraping API that converts websites into clean markdown or structured data for LLMs. Handles JavaScript rendering, anti-bot bypassing, and batch crawling. Used by major AI companies for RAG and training data. 30k+ GitHub stars.
Best for: AI teams building RAG pipelines or data extraction workflows. Works with: Any LLM framework, LangChain, LlamaIndex, Claude Code. Setup time: Under 2 minutes.
Core Features
1. Single Page Scrape
result = app.scrape_url("https://example.com", params={
"formats": ["markdown", "html", "links"],
"onlyMainContent": True, # Strip nav, footer, ads
})2. Full Site Crawl
crawl = app.crawl_url("https://docs.example.com", params={
"limit": 500, # Max pages
"maxDepth": 3, # Link depth
"includePaths": ["/docs/*"],
"excludePaths": ["/blog/*"],
})3. Structured Extraction
result = app.scrape_url("https://example.com/pricing", params={
"formats": ["extract"],
"extract": {
"schema": {
"type": "object",
"properties": {
"plans": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "string"},
"features": {"type": "array", "items": {"type": "string"}}
}
}
}
}
}
}
})4. Map (Discover URLs)
links = app.map_url("https://example.com")
print(f"Found {len(links)} pages")5. Self-Hosting
git clone https://github.com/mendableai/firecrawl
docker compose up -d
# API at http://localhost:3002Use Cases
| Use Case | How |
|---|---|
| RAG Pipeline | Crawl docs → markdown → embed → vector DB |
| Competitive Intel | Scrape competitor pricing pages |
| Training Data | Extract clean text from web sources |
| Monitoring | Track website changes over time |
Pricing
| Tier | Pages/mo | Price |
|---|---|---|
| Free | 500 | $0 |
| Hobby | 3,000 | $16/mo |
| Standard | 100,000 | $83/mo |
| Self-hosted | Unlimited | Free |
FAQ
Q: How does it handle JavaScript-heavy sites? A: Firecrawl uses headless browsers to render JavaScript before extraction.
Q: Can I self-host? A: Yes, fully open-source. Docker Compose deployment available.
Q: How does it compare to Jina Reader? A: Firecrawl offers full site crawling, structured extraction, and sitemap discovery. Jina Reader is simpler (URL prefix for single pages).