# Crawl4AI — LLM-Ready Web Crawler, 25K Stars

> Open-source Python web crawler built for AI and LLMs. Extracts clean markdown from any website with anti-bot bypass, structured extraction, and session management. 25,000+ GitHub stars.

## Install

Save as a script file and run:

## Quick Use

1. Install: `pip install crawl4ai`
2. Run setup: `crawl4ai-setup` (downloads browser)
3. Use in your script:

```python
from crawl4ai import AsyncWebCrawler

async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(url="https://example.com")
    print(result.markdown)  # Clean markdown for LLMs
```

---

## Intro

Crawl4AI is an open-source Python web crawling framework purpose-built for AI applications and LLM data pipelines with 25,000+ GitHub stars. It extracts clean, structured markdown from any website — handling JavaScript rendering, anti-bot detection, and session management automatically. Best for AI developers building RAG pipelines, research agents, or data extraction tools. Works with: Claude Code, LangChain, LlamaIndex, CrewAI. Setup time: under 2 minutes.

---

## Core Features

### LLM-Optimized Output
Crawl4AI outputs clean markdown by default — no HTML parsing needed. Every crawl result includes `result.markdown` ready to feed into any LLM context window.

### Structured Extraction
Extract specific data using CSS selectors, XPath, or LLM-based extraction strategies:

```python
from crawl4ai.extraction_strategy import LLMExtractionStrategy

strategy = LLMExtractionStrategy(
    provider="openai/gpt-4",
    instruction="Extract all product names and prices"
)
result = await crawler.arun(url=url, extraction_strategy=strategy)
```

### Anti-Bot Bypass
Built-in stealth mode with browser fingerprint rotation, proxy support, and human-like behavior simulation. Handles Cloudflare, DataDome, and other protection systems.

### Batch Crawling
Crawl hundreds of pages concurrently with rate limiting:

```python
urls = ["https://site.com/page1", "https://site.com/page2"]
results = await crawler.arun_many(urls, max_concurrent=10)
```

### Key Stats
- 25,000+ GitHub stars
- 300+ contributors
- Supports 50+ website protection bypasses
- Output formats: Markdown, JSON, HTML, screenshots
- Python 3.8+ compatible

### FAQ

**Q: What is Crawl4AI?**
A: Crawl4AI is an open-source Python web crawler that extracts clean markdown from websites, purpose-built for feeding data into LLMs and AI applications.

**Q: Is Crawl4AI free?**
A: Yes, fully open-source under Apache 2.0 license. No API keys or paid plans required.

**Q: How does Crawl4AI compare to Scrapy?**
A: Crawl4AI focuses on AI/LLM use cases with built-in markdown extraction and JavaScript rendering. Scrapy is a general-purpose framework requiring more setup for AI pipelines.

---

## Source & Thanks

> Created by [unclecode](https://github.com/unclecode). Licensed under Apache 2.0.
>
> [crawl4ai](https://github.com/unclecode/crawl4ai) — ⭐ 25,000+

Thanks to the Crawl4AI team for building the go-to web crawler for the AI era.

---

<!-- ZH -->

## 快速使用

1. 安装: `pip install crawl4ai`
2. 运行设置: `crawl4ai-setup`（下载浏览器）
3. 在脚本中使用:

```python
from crawl4ai import AsyncWebCrawler

async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(url="https://example.com")
    print(result.markdown)
```

---

## 简介

Crawl4AI 是一个专为 AI 应用和 LLM 数据管道构建的开源 Python 网页爬虫框架，GitHub 25,000+ stars。它从任何网站提取干净的结构化 Markdown，自动处理 JavaScript 渲染、反爬检测和会话管理。适合构建 RAG 管道、研究 Agent 或数据提取工具的 AI 开发者。兼容 Claude Code、LangChain、LlamaIndex、CrewAI。

---

## 来源与感谢

> Created by [unclecode](https://github.com/unclecode). Licensed under Apache 2.0.
>
> [crawl4ai](https://github.com/unclecode/crawl4ai) — ⭐ 25,000+

---
Source: https://tokrepo.com/en/workflows/cb733d4d-f66b-477d-b69b-61d3322ad8dd
Author: Script Depot