Is Crawl4AI 0.5 — Async LLM-Friendly Web Crawler free to use?

Yes. Crawl4AI 0.5 — Async LLM-Friendly Web Crawler is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Crawl4AI 0.5 — Async LLM-Friendly Web Crawler?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

ScriptsMay 7, 2026·4 min de lectura

Crawl4AI 0.5 — Async LLM-Friendly Web Crawler

Crawl4AI 0.5 is the async Python crawler for RAG. Outputs clean markdown, no HTML cleanup. Adaptive crawling, JS rendering, AsyncWebCrawler API. 30K stars.

Crawl4AI · Community

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Single

Confianza

Confianza: Community

Entrada

Asset

Comando de instalación directa

npx -y tokrepo@latest install 0793dfd1-25c8-4a72-b272-01604d25ceb3 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introducción

Crawl4AI is the LLM-first async web crawler — input a URL, output clean markdown ready to drop into RAG. Version 0.5 adds adaptive crawling (knows when to stop), session-based crawling for SPAs, and Memory-Adaptive Dispatcher to scale to thousands of URLs without exhausting RAM. Best for: RAG pipelines, knowledge-base ingestion, agents that need fresh web content. Works with: Python 3.10+, Playwright. Setup time: 2 minutes (pip install crawl4ai && crawl4ai-setup).

Hello world

import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url="https://news.ycombinator.com")
        print(result.markdown)  # clean markdown, no HTML

asyncio.run(main())

Adaptive crawling

The 0.5 release added adaptive strategies — the crawler decides when it has "enough" to answer the user's intent and stops, instead of always crawling N pages.

from crawl4ai import AdaptiveCrawler, AdaptiveConfig

config = AdaptiveConfig(
    confidence_threshold=0.85,  # stop when 85% confident
    max_pages=50,
)

async with AdaptiveCrawler(config=config) as crawler:
    result = await crawler.digest(
        start_url="https://docs.python.org",
        query="How does the asyncio event loop dispatch coroutines?",
    )
    # result.pages contains only the relevant subset

Memory-Adaptive Dispatcher (1000s of URLs)

from crawl4ai import AsyncWebCrawler, MemoryAdaptiveDispatcher, CrawlerMonitor

dispatcher = MemoryAdaptiveDispatcher(
    memory_threshold_percent=70.0,
    monitor=CrawlerMonitor(),
)

async with AsyncWebCrawler() as crawler:
    results = await crawler.arun_many(
        urls=urls,  # list of 5000+ URLs
        dispatcher=dispatcher,
    )

When RAM hits 70%, the dispatcher pauses new launches until memory frees up. No OOM crashes on long crawls.

Output formats

result.markdown — clean markdown
result.markdown_v2 — with citations preserved
result.fit_markdown — content trimmed to LLM context window
result.media — images and videos extracted
result.links — internal/external links classified

FAQ

Q: Is Crawl4AI free? A: Yes — Apache-2.0 open-source. The library itself is free; Playwright (used for JS rendering) is also free and installs via crawl4ai-setup.

Q: How does this differ from Firecrawl? A: Firecrawl is a hosted SaaS API ($/scrape). Crawl4AI is a Python library you self-host. Same output (clean markdown), different deployment model. Crawl4AI also has more knobs for adaptive crawling and dispatcher control.

Q: Does it handle JavaScript-rendered pages? A: Yes. Crawl4AI uses Playwright under the hood for JS execution. Set js_code="..." to run custom JavaScript, wait_for="selector" to wait for specific elements, or screenshot=True for visual capture.

Quick Use

pip install crawl4ai
Run setup once: crawl4ai-setup (installs Playwright browsers)
Use the AsyncWebCrawler snippet below in your Python script

Intro

Hello world

import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url="https://news.ycombinator.com")
        print(result.markdown)  # clean markdown, no HTML

asyncio.run(main())

Adaptive crawling

The 0.5 release added adaptive strategies — the crawler decides when it has "enough" to answer the user's intent and stops, instead of always crawling N pages.

from crawl4ai import AdaptiveCrawler, AdaptiveConfig

config = AdaptiveConfig(
    confidence_threshold=0.85,  # stop when 85% confident
    max_pages=50,
)

async with AdaptiveCrawler(config=config) as crawler:
    result = await crawler.digest(
        start_url="https://docs.python.org",
        query="How does the asyncio event loop dispatch coroutines?",
    )
    # result.pages contains only the relevant subset

Memory-Adaptive Dispatcher (1000s of URLs)

from crawl4ai import AsyncWebCrawler, MemoryAdaptiveDispatcher, CrawlerMonitor

dispatcher = MemoryAdaptiveDispatcher(
    memory_threshold_percent=70.0,
    monitor=CrawlerMonitor(),
)

async with AsyncWebCrawler() as crawler:
    results = await crawler.arun_many(
        urls=urls,  # list of 5000+ URLs
        dispatcher=dispatcher,
    )

When RAM hits 70%, the dispatcher pauses new launches until memory frees up. No OOM crashes on long crawls.

Output formats

result.markdown — clean markdown
result.markdown_v2 — with citations preserved
result.fit_markdown — content trimmed to LLM context window
result.media — images and videos extracted
result.links — internal/external links classified

FAQ

Q: Is Crawl4AI free? A: Yes — Apache-2.0 open-source. The library itself is free; Playwright (used for JS rendering) is also free and installs via crawl4ai-setup.

Source & Thanks

Built by unclecode. Licensed under Apache-2.0.

unclecode/crawl4ai — ⭐ 30,000+

🙏

Fuente y agradecimientos

Built by unclecode. Licensed under Apache-2.0.

unclecode/crawl4ai — ⭐ 30,000+

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Crawl4AI — LLM-Friendly Web Crawling

Open-source web crawler optimized for AI and LLM use cases. Extracts clean markdown, handles JavaScript-rendered pages, and supports structured data extraction.

Skills

Crawl4AI

Sanic — Async Python Web Framework Built for Speed

Sanic is an async Python web framework built for speed. Native async/await from the ground up, HTTP/1.1 and HTTP/2, WebSocket, streaming, and auto-generated API docs. Designed to be fast, flexible, and easy to use.

Skills

Script Depot

Tornado — Python Async Web Framework and Networking Library

Tornado is a Python web framework and asynchronous networking library originally developed at FriendFeed (acquired by Facebook). Non-blocking I/O, WebSockets, long polling, and thousands of simultaneous connections. One of the earliest async Python web frameworks.

Skills

AI Open Source

Tide — Async Web Framework for Rust

Tide is an asynchronous web framework for Rust that provides a minimal, composable API inspired by Express and Koa for building HTTP servers with async/await and the async-std runtime.

Configs

AI Open Source