ScriptsApr 2, 2026·3 min read

Firecrawl — Web Scraping API for LLMs

Turn any website into clean markdown or structured data for AI. Handles JS rendering, anti-bot, batch crawling. 97K+ stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

```bash pip install firecrawl-py ``` ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="fc-YOUR_KEY") # Scrape a single page to markdown result = app.scrape_url("https://example.com", params={"formats": ["markdown"]}) print(result["markdown"]) # Crawl an entire site crawl = app.crawl_url("https://docs.example.com", params={"limit": 50}) for page in crawl["data"]: print(page["markdown"][:200]) ``` Self-host: `docker compose up` from the repo. Or use the hosted API at firecrawl.dev.
## Introduction Firecrawl is a **web scraping API purpose-built for feeding data to LLMs**. It handles the hardest parts of web scraping — JavaScript rendering, anti-bot protection, rate limiting — and outputs clean markdown or structured JSON that's ready for AI consumption. Core capabilities: - **Scrape to Markdown** — Convert any URL into clean, LLM-friendly markdown. Automatically removes navigation, ads, and boilerplate - **Full Site Crawling** — Recursively crawl entire websites with configurable depth, URL filters, and concurrent requests - **Structured Extraction** — Define a JSON schema and extract structured data from pages using LLMs or rules - **Map Discovery** — Get a complete sitemap of any domain, including URLs not in robots.txt - **JavaScript Rendering** — Full Chromium-based rendering for SPAs, dynamic content, and infinite scroll pages - **Anti-Bot Handling** — Built-in proxy rotation, stealth mode, and CAPTCHA solving - **Batch Operations** — Process thousands of URLs concurrently with automatic retries and rate limiting 97,000+ GitHub stars. SDKs for Python, Node.js, Go, and Rust. Used by major AI companies for training data and RAG pipelines. ## FAQ **Q: How is Firecrawl different from Crawl4AI?** A: Firecrawl is an API service (with self-host option) optimized for scale and reliability. Crawl4AI is a Python library you run locally. Firecrawl has more robust anti-bot handling and structured extraction. Choose Firecrawl for production pipelines, Crawl4AI for local scripts. **Q: Can I self-host it?** A: Yes. Clone the repo and run `docker compose up`. The self-hosted version is fully functional but requires your own proxy infrastructure for anti-bot features. **Q: How much does the hosted API cost?** A: Free tier includes 500 credits/month. Paid plans start at $19/month for 3,000 credits. One credit = one page scrape. **Q: Does it work with JavaScript-heavy sites like React/Next.js apps?** A: Yes. Firecrawl uses headless Chromium for full JS rendering. It waits for dynamic content to load before extracting. ## Works With - Python / Node.js / Go / Rust SDKs - Any LLM for structured extraction (OpenAI, Anthropic, etc.) - Docker for self-hosting - REST API for integration with any language - LangChain / LlamaIndex data loaders
🙏

Source & Thanks

- GitHub: [mendableai/firecrawl](https://github.com/mendableai/firecrawl) - License: AGPL-3.0 (core), MIT (SDKs) - Stars: 97,000+ - Maintainer: Mendable / Firecrawl team Thanks to the Firecrawl team for building the most reliable web-to-AI data pipeline, solving the hardest web scraping challenges so AI developers can focus on building applications.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets