ScriptsMar 31, 2026·2 min read

ScrapeGraphAI — AI-Powered Web Scraping

Python scraping library powered by LLMs. Describe what you want to extract in natural language, get structured data back. Handles dynamic pages. 23K+ stars.

TL;DR
ScrapeGraphAI lets you describe what to extract in plain English and returns structured data from any website.
§01

What it is

ScrapeGraphAI is a Python web scraping library that uses large language models to extract structured data from websites. Instead of writing CSS selectors or XPath queries, you describe what you want in natural language and the LLM figures out how to extract it.

ScrapeGraphAI targets developers who need data extraction from websites without building custom scrapers for each site. It works with OpenAI, Anthropic, Google, and local models via Ollama.

§02

How it saves time or tokens

Traditional scraping requires writing and maintaining selectors that break when sites change their HTML structure. ScrapeGraphAI abstracts this away -- the LLM adapts to different page layouts without code changes. The token_estimate for this workflow is approximately 500 tokens per extraction run.

§03

How to use

  1. Install ScrapeGraphAI and Playwright:
pip install scrapegraphai
playwright install
  1. Create a SmartScraperGraph with your prompt and target URL.
  1. Call .run() to get structured data back as a Python dictionary.
§04

Example

from scrapegraphai.graphs import SmartScraperGraph

graph = SmartScraperGraph(
    prompt='Extract all article titles and their authors',
    source='https://news.ycombinator.com',
    config={'llm': {'model': 'openai/gpt-4o', 'api_key': 'sk-...'}}
)

result = graph.run()
print(result)
# [{'title': '...', 'author': '...'}, ...]
§05

Related on TokRepo

§06

Common pitfalls

  • ScrapeGraphAI requires Playwright for dynamic JavaScript-rendered pages. Without it, only static HTML is parsed.
  • LLM token costs can add up when scraping many pages. Use local models via Ollama for high-volume extraction to reduce API costs.
  • The quality of extraction depends heavily on the prompt specificity. Vague prompts like 'get everything' produce inconsistent results.

Frequently Asked Questions

What LLM providers does ScrapeGraphAI support?+

ScrapeGraphAI supports OpenAI, Anthropic, Google, Groq, and local models via Ollama. You configure the provider and model in the config dictionary when creating a graph instance.

Does ScrapeGraphAI handle JavaScript-rendered pages?+

Yes. ScrapeGraphAI uses Playwright under the hood to render dynamic pages before extraction. You need to run 'playwright install' to set up the browser binaries.

How does ScrapeGraphAI differ from traditional scraping libraries?+

Traditional libraries like BeautifulSoup and Scrapy require you to write CSS selectors or XPath. ScrapeGraphAI uses natural language prompts instead, letting the LLM determine how to locate and extract the target data.

Can I use ScrapeGraphAI with local models?+

Yes. ScrapeGraphAI integrates with Ollama for local model inference. This is useful for high-volume scraping where API costs would be prohibitive. Set the model to an Ollama endpoint in your config.

What graph types does ScrapeGraphAI provide?+

ScrapeGraphAI offers SmartScraperGraph for single-page extraction, SearchGraph for search-engine-based extraction, and SpeechGraph for audio-to-text extraction. SmartScraperGraph is the most commonly used.

Citations (3)
🙏

Source & Thanks

Created by ScrapeGraphAI. Licensed under MIT. ScrapeGraphAI/Scrapegraph-ai — 23,000+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets