ScriptsApr 2, 2026·2 min read

Crawl4AI — LLM-Friendly Web Crawler

Open-source web crawler that outputs clean Markdown for AI. Structured extraction, browser automation, anti-bot handling. 63K+ stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

```bash pip install -U crawl4ai crawl4ai-setup # installs Playwright browsers ``` ```python import asyncio from crawl4ai import AsyncWebCrawler async def main(): async with AsyncWebCrawler() as crawler: result = await crawler.arun(url="https://example.com") print(result.markdown) # Clean markdown output asyncio.run(main()) ``` Or use the REST API: `crawl4ai-server` → `POST http://localhost:11235/crawl`
## Introduction Crawl4AI is an **open-source web crawler purpose-built for AI and LLM applications**. Unlike traditional scrapers that output raw HTML, Crawl4AI converts web pages into clean, structured Markdown optimized for feeding into language models. Core capabilities: - **LLM-Optimized Output** — Converts any web page into clean Markdown with proper headings, lists, code blocks, and links preserved. Strips ads, navigation, and boilerplate automatically - **Structured Data Extraction** — Define JSON schemas and extract structured data from pages using LLMs or CSS/XPath selectors - **Browser Automation** — Built on Playwright for JavaScript-rendered pages. Handle infinite scroll, click-to-expand, and dynamic content - **Anti-Bot Protection** — Automatic proxy rotation, stealth mode, CAPTCHA handling, and human-like browsing patterns - **Batch Crawling** — Crawl thousands of pages concurrently with configurable rate limiting and session management - **Media Extraction** — Download images, videos, and files alongside text content - **Docker Deployment** — Production-ready Docker image with REST API for team and pipeline use 63,000+ GitHub stars. The most popular open-source web crawler for AI applications. ## FAQ **Q: How is Crawl4AI different from BeautifulSoup or Scrapy?** A: BeautifulSoup and Scrapy output raw HTML that needs extensive cleaning for LLMs. Crawl4AI outputs clean Markdown directly, handles JavaScript-rendered pages, and includes built-in LLM extraction capabilities. It's designed specifically for the AI/LLM use case. **Q: Can it handle JavaScript-heavy single-page apps?** A: Yes. Crawl4AI uses Playwright under the hood, so it fully renders JavaScript before extracting content. You can also wait for specific elements, scroll pages, and interact with dynamic content. **Q: Is it free for commercial use?** A: Yes, it's Apache 2.0 licensed. Fully free for personal and commercial use. **Q: How fast is it?** A: With async crawling, Crawl4AI can process 100+ pages per minute depending on target site response times. The concurrent architecture makes it significantly faster than sequential scrapers. ## Works With - Python 3.9+ with async/await - Playwright for browser rendering - OpenAI / Anthropic / local LLMs for structured extraction - Docker for production deployment - REST API for integration with any language
🙏

Source & Thanks

- GitHub: [unclecode/crawl4ai](https://github.com/unclecode/crawl4ai) - License: Apache 2.0 - Stars: 63,000+ - Maintainer: Unclecode & Crawl4AI community Thanks to Unclecode for building the go-to web crawler for the AI era, solving the critical problem of converting messy web content into clean, LLM-ready data.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets