Configs2026年3月31日·1 分钟阅读

GPT Crawler — Build Custom GPTs from Any Website

Crawl any website to generate knowledge files for custom GPTs and RAG. Output as JSON for OpenAI GPTs or any LLM knowledge base. Zero config. 22K+ stars.

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

npx gpt-crawler --url https://docs.example.com --match "https://docs.example.com/**"

Or configure:

// config.ts
export const config = {
  url: "https://docs.example.com",
  match: "https://docs.example.com/**",
  maxPagesToCrawl: 100,
  outputFileName: "output.json",
};
git clone https://github.com/BuilderIO/gpt-crawler.git
cd gpt-crawler && npm install && npm start

Upload output.json to OpenAI GPT Builder or your RAG pipeline.


介绍

GPT Crawler turns any website into a knowledge file for custom GPTs and RAG pipelines. Point it at documentation, help centers, or any website — it crawls pages, extracts clean text, and outputs structured JSON ready for OpenAI's GPT Builder or any LLM knowledge base. Zero AI cost — it's a pure crawler, not an LLM app. 22,000+ GitHub stars, ISC licensed.

Best for: Creating custom GPTs from documentation sites, building RAG knowledge bases from web content Works with: OpenAI GPTs, Claude Projects, any RAG pipeline (LangChain, LlamaIndex)


Key Features

One-Command Crawl

Point at any URL with a glob pattern — get structured JSON output.

Smart Extraction

Extracts main content, strips navigation/ads/boilerplate. Clean text optimized for LLMs.

Configurable

  • maxPagesToCrawl — limit crawl depth
  • match — URL glob patterns to include/exclude
  • selector — CSS selector for content extraction
  • maxTokens — limit output size for GPT upload

Output Formats

JSON array of {title, url, text} objects — ready for:

  • OpenAI GPT Builder (upload as knowledge)
  • Claude Projects (upload as context)
  • Any RAG vector store ingestion

FAQ

Q: What is GPT Crawler? A: A tool that crawls any website and outputs structured JSON for creating custom GPTs and RAG knowledge bases. No AI cost — pure web crawling. 22K+ stars.

Q: How is it different from Crawl4AI or Firecrawl? A: GPT Crawler is simpler — focused specifically on generating GPT knowledge files. Crawl4AI and Firecrawl offer more features (JS rendering, structured extraction, APIs).


🙏

来源与感谢

Created by Builder.io. Licensed under ISC. BuilderIO/gpt-crawler — 22,000+ GitHub stars

相关资产