# Firecrawl — Web Scraping API for LLMs > Turn any website into clean markdown or structured data for AI. Handles JS rendering, anti-bot, batch crawling. 97K+ stars. ## Install Save as a script file and run: # Firecrawl — Web Scraping API for LLMs ## Quick Use ```bash pip install firecrawl-py ``` ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="fc-YOUR_KEY") # Scrape a single page to markdown result = app.scrape_url("https://example.com", params={"formats": ["markdown"]}) print(result["markdown"]) # Crawl an entire site crawl = app.crawl_url("https://docs.example.com", params={"limit": 50}) for page in crawl["data"]: print(page["markdown"][:200]) ``` Self-host: `docker compose up` from the repo. Or use the hosted API at firecrawl.dev. ## Introduction Firecrawl is a **web scraping API purpose-built for feeding data to LLMs**. It handles the hardest parts of web scraping — JavaScript rendering, anti-bot protection, rate limiting — and outputs clean markdown or structured JSON that's ready for AI consumption. Core capabilities: - **Scrape to Markdown** — Convert any URL into clean, LLM-friendly markdown. Automatically removes navigation, ads, and boilerplate - **Full Site Crawling** — Recursively crawl entire websites with configurable depth, URL filters, and concurrent requests - **Structured Extraction** — Define a JSON schema and extract structured data from pages using LLMs or rules - **Map Discovery** — Get a complete sitemap of any domain, including URLs not in robots.txt - **JavaScript Rendering** — Full Chromium-based rendering for SPAs, dynamic content, and infinite scroll pages - **Anti-Bot Handling** — Built-in proxy rotation, stealth mode, and CAPTCHA solving - **Batch Operations** — Process thousands of URLs concurrently with automatic retries and rate limiting 97,000+ GitHub stars. SDKs for Python, Node.js, Go, and Rust. Used by major AI companies for training data and RAG pipelines. ## FAQ **Q: How is Firecrawl different from Crawl4AI?** A: Firecrawl is an API service (with self-host option) optimized for scale and reliability. Crawl4AI is a Python library you run locally. Firecrawl has more robust anti-bot handling and structured extraction. Choose Firecrawl for production pipelines, Crawl4AI for local scripts. **Q: Can I self-host it?** A: Yes. Clone the repo and run `docker compose up`. The self-hosted version is fully functional but requires your own proxy infrastructure for anti-bot features. **Q: How much does the hosted API cost?** A: Free tier includes 500 credits/month. Paid plans start at $19/month for 3,000 credits. One credit = one page scrape. **Q: Does it work with JavaScript-heavy sites like React/Next.js apps?** A: Yes. Firecrawl uses headless Chromium for full JS rendering. It waits for dynamic content to load before extracting. ## Works With - Python / Node.js / Go / Rust SDKs - Any LLM for structured extraction (OpenAI, Anthropic, etc.) - Docker for self-hosting - REST API for integration with any language - LangChain / LlamaIndex data loaders ## Source & Thanks - GitHub: [mendableai/firecrawl](https://github.com/mendableai/firecrawl) - License: AGPL-3.0 (core), MIT (SDKs) - Stars: 97,000+ - Maintainer: Mendable / Firecrawl team Thanks to the Firecrawl team for building the most reliable web-to-AI data pipeline, solving the hardest web scraping challenges so AI developers can focus on building applications. --- Source: https://tokrepo.com/en/workflows/cf56d8ce-a891-441d-af68-3dc0c5abd881 Author: TokRepo精选