Scripts2026年4月2日·1 分钟阅读

Firecrawl — Web Scraping API for LLMs

Turn any website into clean markdown or structured data for AI. Handles JS rendering, anti-bot, batch crawling. 97K+ stars.

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

```bash pip install firecrawl-py ``` ```python from firecrawl import FirecrawlApp app = FirecrawlApp(api_key="fc-YOUR_KEY") # Scrape a single page to markdown result = app.scrape_url("https://example.com", params={"formats": ["markdown"]}) print(result["markdown"]) # Crawl an entire site crawl = app.crawl_url("https://docs.example.com", params={"limit": 50}) for page in crawl["data"]: print(page["markdown"][:200]) ``` Self-host: `docker compose up` from the repo. Or use the hosted API at firecrawl.dev.
## Introduction Firecrawl is a **web scraping API purpose-built for feeding data to LLMs**. It handles the hardest parts of web scraping — JavaScript rendering, anti-bot protection, rate limiting — and outputs clean markdown or structured JSON that's ready for AI consumption. Core capabilities: - **Scrape to Markdown** — Convert any URL into clean, LLM-friendly markdown. Automatically removes navigation, ads, and boilerplate - **Full Site Crawling** — Recursively crawl entire websites with configurable depth, URL filters, and concurrent requests - **Structured Extraction** — Define a JSON schema and extract structured data from pages using LLMs or rules - **Map Discovery** — Get a complete sitemap of any domain, including URLs not in robots.txt - **JavaScript Rendering** — Full Chromium-based rendering for SPAs, dynamic content, and infinite scroll pages - **Anti-Bot Handling** — Built-in proxy rotation, stealth mode, and CAPTCHA solving - **Batch Operations** — Process thousands of URLs concurrently with automatic retries and rate limiting 97,000+ GitHub stars. SDKs for Python, Node.js, Go, and Rust. Used by major AI companies for training data and RAG pipelines. ## FAQ **Q: How is Firecrawl different from Crawl4AI?** A: Firecrawl is an API service (with self-host option) optimized for scale and reliability. Crawl4AI is a Python library you run locally. Firecrawl has more robust anti-bot handling and structured extraction. Choose Firecrawl for production pipelines, Crawl4AI for local scripts. **Q: Can I self-host it?** A: Yes. Clone the repo and run `docker compose up`. The self-hosted version is fully functional but requires your own proxy infrastructure for anti-bot features. **Q: How much does the hosted API cost?** A: Free tier includes 500 credits/month. Paid plans start at $19/month for 3,000 credits. One credit = one page scrape. **Q: Does it work with JavaScript-heavy sites like React/Next.js apps?** A: Yes. Firecrawl uses headless Chromium for full JS rendering. It waits for dynamic content to load before extracting. ## Works With - Python / Node.js / Go / Rust SDKs - Any LLM for structured extraction (OpenAI, Anthropic, etc.) - Docker for self-hosting - REST API for integration with any language - LangChain / LlamaIndex data loaders
🙏

来源与感谢

- GitHub: [mendableai/firecrawl](https://github.com/mendableai/firecrawl) - License: AGPL-3.0 (core), MIT (SDKs) - Stars: 97,000+ - Maintainer: Mendable / Firecrawl team Thanks to the Firecrawl team for building the most reliable web-to-AI data pipeline, solving the hardest web scraping challenges so AI developers can focus on building applications.

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产