# GPT Crawler — Build Custom GPTs from Any Website > Crawl any website to generate knowledge files for custom GPTs and RAG. Output as JSON for OpenAI GPTs or any LLM knowledge base. Zero config. 22K+ stars. ## Install Save in your project root: ## Quick Use ```bash npx gpt-crawler --url https://docs.example.com --match "https://docs.example.com/**" ``` Or configure: ```typescript // config.ts export const config = { url: "https://docs.example.com", match: "https://docs.example.com/**", maxPagesToCrawl: 100, outputFileName: "output.json", }; ``` ```bash git clone https://github.com/BuilderIO/gpt-crawler.git cd gpt-crawler && npm install && npm start ``` Upload `output.json` to OpenAI GPT Builder or your RAG pipeline. --- ## Intro GPT Crawler turns any website into a knowledge file for custom GPTs and RAG pipelines. Point it at documentation, help centers, or any website — it crawls pages, extracts clean text, and outputs structured JSON ready for OpenAI's GPT Builder or any LLM knowledge base. Zero AI cost — it's a pure crawler, not an LLM app. 22,000+ GitHub stars, ISC licensed. **Best for**: Creating custom GPTs from documentation sites, building RAG knowledge bases from web content **Works with**: OpenAI GPTs, Claude Projects, any RAG pipeline (LangChain, LlamaIndex) --- ## Key Features ### One-Command Crawl Point at any URL with a glob pattern — get structured JSON output. ### Smart Extraction Extracts main content, strips navigation/ads/boilerplate. Clean text optimized for LLMs. ### Configurable - `maxPagesToCrawl` — limit crawl depth - `match` — URL glob patterns to include/exclude - `selector` — CSS selector for content extraction - `maxTokens` — limit output size for GPT upload ### Output Formats JSON array of `{title, url, text}` objects — ready for: - OpenAI GPT Builder (upload as knowledge) - Claude Projects (upload as context) - Any RAG vector store ingestion --- ### FAQ **Q: What is GPT Crawler?** A: A tool that crawls any website and outputs structured JSON for creating custom GPTs and RAG knowledge bases. No AI cost — pure web crawling. 22K+ stars. **Q: How is it different from Crawl4AI or Firecrawl?** A: GPT Crawler is simpler — focused specifically on generating GPT knowledge files. Crawl4AI and Firecrawl offer more features (JS rendering, structured extraction, APIs). --- ## Source & Thanks > Created by [Builder.io](https://github.com/BuilderIO). Licensed under ISC. > [BuilderIO/gpt-crawler](https://github.com/BuilderIO/gpt-crawler) — 22,000+ GitHub stars --- Source: https://tokrepo.com/en/workflows/bbd3962b-db9b-4ce9-9efe-31f44d08fdff Author: AI Open Source