# Firecrawl — Web Scraping API for AI Applications

> Turn any website into clean markdown or structured data for LLMs. Firecrawl handles JavaScript rendering, anti-bot bypassing, sitemaps, and batch crawling via simple API.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

```bash
pip install firecrawl-py
```

```python
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="fc-...")

# Scrape a single page
result = app.scrape_url("https://docs.anthropic.com", params={"formats": ["markdown"]})
print(result["markdown"])

# Crawl entire site
crawl = app.crawl_url("https://docs.anthropic.com", params={"limit": 100})
for page in crawl["data"]:
    print(page["markdown"][:200])
```

## What is Firecrawl?

Firecrawl is a web scraping API designed for AI applications. It converts any website into clean markdown or structured data that LLMs can consume. It handles JavaScript rendering, anti-bot detection, rate limiting, and sitemap discovery — so you can focus on building your AI pipeline.

**Answer-Ready**: Firecrawl is a web scraping API that converts websites into clean markdown or structured data for LLMs. Handles JavaScript rendering, anti-bot bypassing, and batch crawling. Used by major AI companies for RAG and training data. 30k+ GitHub stars.

**Best for**: AI teams building RAG pipelines or data extraction workflows. **Works with**: Any LLM framework, LangChain, LlamaIndex, Claude Code. **Setup time**: Under 2 minutes.

## Core Features

### 1. Single Page Scrape

```python
result = app.scrape_url("https://example.com", params={
    "formats": ["markdown", "html", "links"],
    "onlyMainContent": True,  # Strip nav, footer, ads
})
```

### 2. Full Site Crawl

```python
crawl = app.crawl_url("https://docs.example.com", params={
    "limit": 500,           # Max pages
    "maxDepth": 3,          # Link depth
    "includePaths": ["/docs/*"],
    "excludePaths": ["/blog/*"],
})
```

### 3. Structured Extraction

```python
result = app.scrape_url("https://example.com/pricing", params={
    "formats": ["extract"],
    "extract": {
        "schema": {
            "type": "object",
            "properties": {
                "plans": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "price": {"type": "string"},
                            "features": {"type": "array", "items": {"type": "string"}}
                        }
                    }
                }
            }
        }
    }
})
```

### 4. Map (Discover URLs)

```python
links = app.map_url("https://example.com")
print(f"Found {len(links)} pages")
```

### 5. Self-Hosting

```bash
git clone https://github.com/mendableai/firecrawl
docker compose up -d
# API at http://localhost:3002
```

## Use Cases

| Use Case | How |
|----------|-----|
| RAG Pipeline | Crawl docs → markdown → embed → vector DB |
| Competitive Intel | Scrape competitor pricing pages |
| Training Data | Extract clean text from web sources |
| Monitoring | Track website changes over time |

## Pricing

| Tier | Pages/mo | Price |
|------|----------|-------|
| Free | 500 | $0 |
| Hobby | 3,000 | $16/mo |
| Standard | 100,000 | $83/mo |
| Self-hosted | Unlimited | Free |

## FAQ

**Q: How does it handle JavaScript-heavy sites?**
A: Firecrawl uses headless browsers to render JavaScript before extraction.

**Q: Can I self-host?**
A: Yes, fully open-source. Docker Compose deployment available.

**Q: How does it compare to Jina Reader?**
A: Firecrawl offers full site crawling, structured extraction, and sitemap discovery. Jina Reader is simpler (URL prefix for single pages).

## Source & Thanks

> Created by [Mendable](https://github.com/mendableai). Licensed under AGPL-3.0.
>
> [mendableai/firecrawl](https://github.com/mendableai/firecrawl) — 30k+ stars

<!-- ZH -->


## Quick Start

```bash
pip install firecrawl-py
```

Turn any website into AI-friendly Markdown in three lines.

## What is Firecrawl?

Firecrawl is a web scraping API built for AI applications. Converts websites into clean Markdown or structured data while handling JS rendering, anti-detection, and batch crawling.

**In one sentence**: Web scraping API that turns websites into LLM-consumable Markdown — supports JS rendering, structured extraction, and full-site crawling — 30k+ GitHub stars.

**For**: AI teams building RAG pipelines or data extraction workflows.

## Core Features

### 1. Single-Page Scraping
Get clean Markdown with one line of code.

### 2. Full-Site Crawl
Automatic link discovery with depth and path filters.

### 3. Structured Extraction
Define output format with JSON Schema.

### 4. Self-Hostable
Deploy with Docker Compose — no limits.

## FAQ

**Q: Does it support JS rendering?**
A: Yes — uses a headless browser to render before extracting.

**Q: How does it compare to Jina Reader?**
A: Firecrawl offers full-site crawling and structured extraction; Jina Reader is simpler (single-page URL prefix).

## Source & Thanks

> [mendableai/firecrawl](https://github.com/mendableai/firecrawl) — 30k+ stars, AGPL-3.0

---
Source: https://tokrepo.com/en/workflows/firecrawl-web-scraping-api-ai-applications-6a62a986
Author: Firecrawl