Quick Use
pip install crawl4ai- Run setup once:
crawl4ai-setup(installs Playwright browsers) - Use the AsyncWebCrawler snippet below in your Python script
Intro
Crawl4AI is the LLM-first async web crawler — input a URL, output clean markdown ready to drop into RAG. Version 0.5 adds adaptive crawling (knows when to stop), session-based crawling for SPAs, and Memory-Adaptive Dispatcher to scale to thousands of URLs without exhausting RAM. Best for: RAG pipelines, knowledge-base ingestion, agents that need fresh web content. Works with: Python 3.10+, Playwright. Setup time: 2 minutes (pip install crawl4ai && crawl4ai-setup).
Hello world
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url="https://news.ycombinator.com")
print(result.markdown) # clean markdown, no HTML
asyncio.run(main())Adaptive crawling
The 0.5 release added adaptive strategies — the crawler decides when it has "enough" to answer the user's intent and stops, instead of always crawling N pages.
from crawl4ai import AdaptiveCrawler, AdaptiveConfig
config = AdaptiveConfig(
confidence_threshold=0.85, # stop when 85% confident
max_pages=50,
)
async with AdaptiveCrawler(config=config) as crawler:
result = await crawler.digest(
start_url="https://docs.python.org",
query="How does the asyncio event loop dispatch coroutines?",
)
# result.pages contains only the relevant subsetMemory-Adaptive Dispatcher (1000s of URLs)
from crawl4ai import AsyncWebCrawler, MemoryAdaptiveDispatcher, CrawlerMonitor
dispatcher = MemoryAdaptiveDispatcher(
memory_threshold_percent=70.0,
monitor=CrawlerMonitor(),
)
async with AsyncWebCrawler() as crawler:
results = await crawler.arun_many(
urls=urls, # list of 5000+ URLs
dispatcher=dispatcher,
)When RAM hits 70%, the dispatcher pauses new launches until memory frees up. No OOM crashes on long crawls.
Output formats
result.markdown— clean markdownresult.markdown_v2— with citations preservedresult.fit_markdown— content trimmed to LLM context windowresult.media— images and videos extractedresult.links— internal/external links classified
FAQ
Q: Is Crawl4AI free? A: Yes — Apache-2.0 open-source. The library itself is free; Playwright (used for JS rendering) is also free and installs via crawl4ai-setup.
Q: How does this differ from Firecrawl? A: Firecrawl is a hosted SaaS API ($/scrape). Crawl4AI is a Python library you self-host. Same output (clean markdown), different deployment model. Crawl4AI also has more knobs for adaptive crawling and dispatcher control.
Q: Does it handle JavaScript-rendered pages?
A: Yes. Crawl4AI uses Playwright under the hood for JS execution. Set js_code="..." to run custom JavaScript, wait_for="selector" to wait for specific elements, or screenshot=True for visual capture.
Source & Thanks
Built by unclecode. Licensed under Apache-2.0.
unclecode/crawl4ai — ⭐ 30,000+