What is GPT Crawler — Build Custom GPTs from Any Website?

Crawl any website to generate knowledge files for custom GPTs and RAG. Output as JSON for OpenAI GPTs or any LLM knowledge base. Zero config. 22K+ stars.

Is GPT Crawler — Build Custom GPTs from Any Website free to use?

Yes. GPT Crawler — Build Custom GPTs from Any Website is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install GPT Crawler — Build Custom GPTs from Any Website?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

GPT Crawler — Build Custom GPTs from Any Website

Key Features

One-Command Crawl

Point at any URL with a glob pattern — get structured JSON output.

Smart Extraction

Extracts main content, strips navigation/ads/boilerplate. Clean text optimized for LLMs.

Configurable

maxPagesToCrawl — limit crawl depth
match — URL glob patterns to include/exclude
selector — CSS selector for content extraction
maxTokens — limit output size for GPT upload

Output Formats

JSON array of {title, url, text} objects — ready for:

OpenAI GPT Builder (upload as knowledge)
Claude Projects (upload as context)
Any RAG vector store ingestion

FAQ

Q: What is GPT Crawler? A: A tool that crawls any website and outputs structured JSON for creating custom GPTs and RAG knowledge bases. No AI cost — pure web crawling. 22K+ stars.

Q: How is it different from Crawl4AI or Firecrawl? A: GPT Crawler is simpler — focused specifically on generating GPT knowledge files. Crawl4AI and Firecrawl offer more features (JS rendering, structured extraction, APIs).

GPT Crawler — Build Custom GPTs from Any Website

先拿来用，再决定要不要深挖

Key Features

One-Command Crawl

Smart Extraction

Configurable

Output Formats

FAQ

来源与感谢

相关资产

RAGFlow — Deep Document Understanding RAG Engine

Gradio — Build ML Demos & AI Apps in Python

ScrapeGraphAI — AI-Powered Web Scraping