ScriptsMar 31, 2026·2 min read

ScrapeGraphAI — AI-Powered Web Scraping

Python scraping library powered by LLMs. Describe what you want to extract in natural language, get structured data back. Handles dynamic pages. 23K+ stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install scrapegraphai
playwright install
from scrapegraphai.graphs import SmartScraperGraph

graph = SmartScraperGraph(
    prompt="Extract all article titles and their authors",
    source="https://news.ycombinator.com",
    config={"llm": {"model": "openai/gpt-4o", "api_key": "sk-..."}}
)

result = graph.run()
print(result)
# [{"title": "...", "author": "..."}, ...]

Intro

ScrapeGraphAI is a Python web scraping library that uses LLMs to extract structured data from websites. Instead of writing CSS selectors or XPath, describe what you want in natural language. It handles dynamic JavaScript pages (via Playwright), follows pagination, and returns clean structured data. Works with OpenAI, Anthropic, Google, and local models via Ollama. 23,000+ GitHub stars, MIT licensed.

Best for: Developers who need structured data extraction from websites without writing scrapers Works with: OpenAI, Anthropic, Google, Ollama, Groq, any LLM


Key Features

Natural Language Extraction

Describe what you want — the LLM figures out how to extract it:

prompt = "Get all product names, prices, and ratings from this page"

Multiple Graph Types

Graph Use Case
SmartScraperGraph Single page extraction
SearchGraph Search + extract from results
SpeechGraph Extract + convert to audio
ScriptCreatorGraph Generate reusable scraper code
SmartScraperMultiGraph Multi-page extraction

Dynamic Pages

Built-in Playwright support for JavaScript-rendered content. Handles SPAs, infinite scroll, and AJAX.

Structured Output

Returns clean JSON/dict matching your prompt. No post-processing needed.

Local Models

Run entirely offline with Ollama — no data sent to cloud APIs.


FAQ

Q: What is ScrapeGraphAI? A: An AI-powered Python scraping library. Describe what you want to extract in natural language, get structured data back. Handles dynamic JS pages. 23K+ stars.

Q: Is it legal to scrape websites with ScrapeGraphAI? A: ScrapeGraphAI is a tool — legality depends on the target site's terms of service and your jurisdiction. Always respect robots.txt and rate limits.


🙏

Source & Thanks

Created by ScrapeGraphAI. Licensed under MIT. ScrapeGraphAI/Scrapegraph-ai — 23,000+ GitHub stars

Related Assets