# GPT Crawler — Build Custom GPTs from Any Website

> Crawl any website to generate knowledge files for custom GPTs and RAG. Output as JSON for OpenAI GPTs or any LLM knowledge base. Zero config. 22K+ stars.

## Install

Save in your project root:

## Quick Use

```bash
npx gpt-crawler --url https://docs.example.com --match "https://docs.example.com/**"
```

Or configure:
```typescript
// config.ts
export const config = {
  url: "https://docs.example.com",
  match: "https://docs.example.com/**",
  maxPagesToCrawl: 100,
  outputFileName: "output.json",
};
```

```bash
git clone https://github.com/BuilderIO/gpt-crawler.git
cd gpt-crawler && npm install && npm start
```

Upload `output.json` to OpenAI GPT Builder or your RAG pipeline.

---

## Intro

GPT Crawler turns any website into a knowledge file for custom GPTs and RAG pipelines. Point it at documentation, help centers, or any website — it crawls pages, extracts clean text, and outputs structured JSON ready for OpenAI's GPT Builder or any LLM knowledge base. Zero AI cost — it's a pure crawler, not an LLM app. 22,000+ GitHub stars, ISC licensed.

**Best for**: Creating custom GPTs from documentation sites, building RAG knowledge bases from web content
**Works with**: OpenAI GPTs, Claude Projects, any RAG pipeline (LangChain, LlamaIndex)

---

## Key Features

### One-Command Crawl
Point at any URL with a glob pattern — get structured JSON output.

### Smart Extraction
Extracts main content, strips navigation/ads/boilerplate. Clean text optimized for LLMs.

### Configurable
- `maxPagesToCrawl` — limit crawl depth
- `match` — URL glob patterns to include/exclude
- `selector` — CSS selector for content extraction
- `maxTokens` — limit output size for GPT upload

### Output Formats
JSON array of `{title, url, text}` objects — ready for:
- OpenAI GPT Builder (upload as knowledge)
- Claude Projects (upload as context)
- Any RAG vector store ingestion

---

### FAQ

**Q: What is GPT Crawler?**
A: A tool that crawls any website and outputs structured JSON for creating custom GPTs and RAG knowledge bases. No AI cost — pure web crawling. 22K+ stars.

**Q: How is it different from Crawl4AI or Firecrawl?**
A: GPT Crawler is simpler — focused specifically on generating GPT knowledge files. Crawl4AI and Firecrawl offer more features (JS rendering, structured extraction, APIs).

---

## Source & Thanks

> Created by [Builder.io](https://github.com/BuilderIO). Licensed under ISC.
> [BuilderIO/gpt-crawler](https://github.com/BuilderIO/gpt-crawler) — 22,000+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/bbd3962b-db9b-4ce9-9efe-31f44d08fdff
Author: AI Open Source