Key Features
One-Command Crawl
Point at any URL with a glob pattern — get structured JSON output.
Smart Extraction
Extracts main content, strips navigation/ads/boilerplate. Clean text optimized for LLMs.
Configurable
maxPagesToCrawl— limit crawl depthmatch— URL glob patterns to include/excludeselector— CSS selector for content extractionmaxTokens— limit output size for GPT upload
Output Formats
JSON array of {title, url, text} objects — ready for:
- OpenAI GPT Builder (upload as knowledge)
- Claude Projects (upload as context)
- Any RAG vector store ingestion
FAQ
Q: What is GPT Crawler? A: A tool that crawls any website and outputs structured JSON for creating custom GPTs and RAG knowledge bases. No AI cost — pure web crawling. 22K+ stars.
Q: How is it different from Crawl4AI or Firecrawl? A: GPT Crawler is simpler — focused specifically on generating GPT knowledge files. Crawl4AI and Firecrawl offer more features (JS rendering, structured extraction, APIs).