Scripts2026年3月31日·1 分钟阅读

ScrapeGraphAI — AI-Powered Web Scraping

Python scraping library powered by LLMs. Describe what you want to extract in natural language, get structured data back. Handles dynamic pages. 23K+ stars.

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

pip install scrapegraphai
playwright install
from scrapegraphai.graphs import SmartScraperGraph

graph = SmartScraperGraph(
    prompt="Extract all article titles and their authors",
    source="https://news.ycombinator.com",
    config={"llm": {"model": "openai/gpt-4o", "api_key": "sk-..."}}
)

result = graph.run()
print(result)
# [{"title": "...", "author": "..."}, ...]

介绍

ScrapeGraphAI is a Python web scraping library that uses LLMs to extract structured data from websites. Instead of writing CSS selectors or XPath, describe what you want in natural language. It handles dynamic JavaScript pages (via Playwright), follows pagination, and returns clean structured data. Works with OpenAI, Anthropic, Google, and local models via Ollama. 23,000+ GitHub stars, MIT licensed.

Best for: Developers who need structured data extraction from websites without writing scrapers Works with: OpenAI, Anthropic, Google, Ollama, Groq, any LLM


Key Features

Natural Language Extraction

Describe what you want — the LLM figures out how to extract it:

prompt = "Get all product names, prices, and ratings from this page"

Multiple Graph Types

Graph Use Case
SmartScraperGraph Single page extraction
SearchGraph Search + extract from results
SpeechGraph Extract + convert to audio
ScriptCreatorGraph Generate reusable scraper code
SmartScraperMultiGraph Multi-page extraction

Dynamic Pages

Built-in Playwright support for JavaScript-rendered content. Handles SPAs, infinite scroll, and AJAX.

Structured Output

Returns clean JSON/dict matching your prompt. No post-processing needed.

Local Models

Run entirely offline with Ollama — no data sent to cloud APIs.


FAQ

Q: What is ScrapeGraphAI? A: An AI-powered Python scraping library. Describe what you want to extract in natural language, get structured data back. Handles dynamic JS pages. 23K+ stars.

Q: Is it legal to scrape websites with ScrapeGraphAI? A: ScrapeGraphAI is a tool — legality depends on the target site's terms of service and your jurisdiction. Always respect robots.txt and rate limits.


🙏

来源与感谢

Created by ScrapeGraphAI. Licensed under MIT. ScrapeGraphAI/Scrapegraph-ai — 23,000+ GitHub stars

相关资产