# Crawl4AI — LLM-Ready Web Crawler, 25K Stars > Open-source Python web crawler built for AI and LLMs. Extracts clean markdown from any website with anti-bot bypass, structured extraction, and session management. 25,000+ GitHub stars. ## Install Save as a script file and run: ## Quick Use 1. Install: `pip install crawl4ai` 2. Run setup: `crawl4ai-setup` (downloads browser) 3. Use in your script: ```python from crawl4ai import AsyncWebCrawler async with AsyncWebCrawler() as crawler: result = await crawler.arun(url="https://example.com") print(result.markdown) # Clean markdown for LLMs ``` --- ## Intro Crawl4AI is an open-source Python web crawling framework purpose-built for AI applications and LLM data pipelines with 25,000+ GitHub stars. It extracts clean, structured markdown from any website — handling JavaScript rendering, anti-bot detection, and session management automatically. Best for AI developers building RAG pipelines, research agents, or data extraction tools. Works with: Claude Code, LangChain, LlamaIndex, CrewAI. Setup time: under 2 minutes. --- ## Core Features ### LLM-Optimized Output Crawl4AI outputs clean markdown by default — no HTML parsing needed. Every crawl result includes `result.markdown` ready to feed into any LLM context window. ### Structured Extraction Extract specific data using CSS selectors, XPath, or LLM-based extraction strategies: ```python from crawl4ai.extraction_strategy import LLMExtractionStrategy strategy = LLMExtractionStrategy( provider="openai/gpt-4", instruction="Extract all product names and prices" ) result = await crawler.arun(url=url, extraction_strategy=strategy) ``` ### Anti-Bot Bypass Built-in stealth mode with browser fingerprint rotation, proxy support, and human-like behavior simulation. Handles Cloudflare, DataDome, and other protection systems. ### Batch Crawling Crawl hundreds of pages concurrently with rate limiting: ```python urls = ["https://site.com/page1", "https://site.com/page2"] results = await crawler.arun_many(urls, max_concurrent=10) ``` ### Key Stats - 25,000+ GitHub stars - 300+ contributors - Supports 50+ website protection bypasses - Output formats: Markdown, JSON, HTML, screenshots - Python 3.8+ compatible ### FAQ **Q: What is Crawl4AI?** A: Crawl4AI is an open-source Python web crawler that extracts clean markdown from websites, purpose-built for feeding data into LLMs and AI applications. **Q: Is Crawl4AI free?** A: Yes, fully open-source under Apache 2.0 license. No API keys or paid plans required. **Q: How does Crawl4AI compare to Scrapy?** A: Crawl4AI focuses on AI/LLM use cases with built-in markdown extraction and JavaScript rendering. Scrapy is a general-purpose framework requiring more setup for AI pipelines. --- ## Source & Thanks > Created by [unclecode](https://github.com/unclecode). Licensed under Apache 2.0. > > [crawl4ai](https://github.com/unclecode/crawl4ai) — ⭐ 25,000+ Thanks to the Crawl4AI team for building the go-to web crawler for the AI era. --- ## 快速使用 1. 安装: `pip install crawl4ai` 2. 运行设置: `crawl4ai-setup`(下载浏览器) 3. 在脚本中使用: ```python from crawl4ai import AsyncWebCrawler async with AsyncWebCrawler() as crawler: result = await crawler.arun(url="https://example.com") print(result.markdown) ``` --- ## 简介 Crawl4AI 是一个专为 AI 应用和 LLM 数据管道构建的开源 Python 网页爬虫框架,GitHub 25,000+ stars。它从任何网站提取干净的结构化 Markdown,自动处理 JavaScript 渲染、反爬检测和会话管理。适合构建 RAG 管道、研究 Agent 或数据提取工具的 AI 开发者。兼容 Claude Code、LangChain、LlamaIndex、CrewAI。 --- ## 来源与感谢 > Created by [unclecode](https://github.com/unclecode). Licensed under Apache 2.0. > > [crawl4ai](https://github.com/unclecode/crawl4ai) — ⭐ 25,000+ --- Source: https://tokrepo.com/en/workflows/cb733d4d-f66b-477d-b69b-61d3322ad8dd Author: Script Depot