Configs2026年3月31日·1 分钟阅读

GPT Crawler — Build Custom GPTs from Any Website

Crawl any website to generate knowledge files for custom GPTs and RAG. Output as JSON for OpenAI GPTs or any LLM knowledge base. Zero config. 22K+ stars.

介绍

GPT Crawler turns any website into a knowledge file for custom GPTs and RAG pipelines. Point it at documentation, help centers, or any website — it crawls pages, extracts clean text, and outputs structured JSON ready for OpenAI's GPT Builder or any LLM knowledge base. Zero AI cost — it's a pure crawler, not an LLM app. 22,000+ GitHub stars, ISC licensed.

Best for: Creating custom GPTs from documentation sites, building RAG knowledge bases from web content Works with: OpenAI GPTs, Claude Projects, any RAG pipeline (LangChain, LlamaIndex)


Key Features

One-Command Crawl

Point at any URL with a glob pattern — get structured JSON output.

Smart Extraction

Extracts main content, strips navigation/ads/boilerplate. Clean text optimized for LLMs.

Configurable

  • maxPagesToCrawl — limit crawl depth
  • match — URL glob patterns to include/exclude
  • selector — CSS selector for content extraction
  • maxTokens — limit output size for GPT upload

Output Formats

JSON array of {title, url, text} objects — ready for:

  • OpenAI GPT Builder (upload as knowledge)
  • Claude Projects (upload as context)
  • Any RAG vector store ingestion

FAQ

Q: What is GPT Crawler? A: A tool that crawls any website and outputs structured JSON for creating custom GPTs and RAG knowledge bases. No AI cost — pure web crawling. 22K+ stars.

Q: How is it different from Crawl4AI or Firecrawl? A: GPT Crawler is simpler — focused specifically on generating GPT knowledge files. Crawl4AI and Firecrawl offer more features (JS rendering, structured extraction, APIs).


🙏

来源与感谢

Created by Builder.io. Licensed under ISC. BuilderIO/gpt-crawler — 22,000+ GitHub stars

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产