# Firecrawl — Web Scraping API for LLMs

> Turn any website into clean markdown or structured data for AI. Handles JS rendering, anti-bot, batch crawling. 97K+ stars.

## Install

Save as a script file and run:

# Firecrawl — Web Scraping API for LLMs

## Quick Use

```bash
pip install firecrawl-py
```

```python
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="fc-YOUR_KEY")

# Scrape a single page to markdown
result = app.scrape_url("https://example.com", params={"formats": ["markdown"]})
print(result["markdown"])

# Crawl an entire site
crawl = app.crawl_url("https://docs.example.com", params={"limit": 50})
for page in crawl["data"]:
    print(page["markdown"][:200])
```

Self-host: `docker compose up` from the repo. Or use the hosted API at firecrawl.dev.

## Introduction

Firecrawl is a **web scraping API purpose-built for feeding data to LLMs**. It handles the hardest parts of web scraping — JavaScript rendering, anti-bot protection, rate limiting — and outputs clean markdown or structured JSON that's ready for AI consumption.

Core capabilities:

- **Scrape to Markdown** — Convert any URL into clean, LLM-friendly markdown. Automatically removes navigation, ads, and boilerplate
- **Full Site Crawling** — Recursively crawl entire websites with configurable depth, URL filters, and concurrent requests
- **Structured Extraction** — Define a JSON schema and extract structured data from pages using LLMs or rules
- **Map Discovery** — Get a complete sitemap of any domain, including URLs not in robots.txt
- **JavaScript Rendering** — Full Chromium-based rendering for SPAs, dynamic content, and infinite scroll pages
- **Anti-Bot Handling** — Built-in proxy rotation, stealth mode, and CAPTCHA solving
- **Batch Operations** — Process thousands of URLs concurrently with automatic retries and rate limiting

97,000+ GitHub stars. SDKs for Python, Node.js, Go, and Rust. Used by major AI companies for training data and RAG pipelines.

## FAQ

**Q: How is Firecrawl different from Crawl4AI?**
A: Firecrawl is an API service (with self-host option) optimized for scale and reliability. Crawl4AI is a Python library you run locally. Firecrawl has more robust anti-bot handling and structured extraction. Choose Firecrawl for production pipelines, Crawl4AI for local scripts.

**Q: Can I self-host it?**
A: Yes. Clone the repo and run `docker compose up`. The self-hosted version is fully functional but requires your own proxy infrastructure for anti-bot features.

**Q: How much does the hosted API cost?**
A: Free tier includes 500 credits/month. Paid plans start at $19/month for 3,000 credits. One credit = one page scrape.

**Q: Does it work with JavaScript-heavy sites like React/Next.js apps?**
A: Yes. Firecrawl uses headless Chromium for full JS rendering. It waits for dynamic content to load before extracting.

## Works With

- Python / Node.js / Go / Rust SDKs
- Any LLM for structured extraction (OpenAI, Anthropic, etc.)
- Docker for self-hosting
- REST API for integration with any language
- LangChain / LlamaIndex data loaders

## Source & Thanks

- GitHub: [mendableai/firecrawl](https://github.com/mendableai/firecrawl)
- License: AGPL-3.0 (core), MIT (SDKs)
- Stars: 97,000+
- Maintainer: Mendable / Firecrawl team

Thanks to the Firecrawl team for building the most reliable web-to-AI data pipeline, solving the hardest web scraping challenges so AI developers can focus on building applications.


---
Source: https://tokrepo.com/en/workflows/cf56d8ce-a891-441d-af68-3dc0c5abd881
Author: TokRepo精选