Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsMar 31, 2026·2 min de lectura

GPT Crawler — Build Custom GPTs from Any Website

Crawl any website to generate knowledge files for custom GPTs and RAG. Output as JSON for OpenAI GPTs or any LLM knowledge base. Zero config. 22K+ stars.

Introducción

GPT Crawler turns any website into a knowledge file for custom GPTs and RAG pipelines. Point it at documentation, help centers, or any website — it crawls pages, extracts clean text, and outputs structured JSON ready for OpenAI's GPT Builder or any LLM knowledge base. Zero AI cost — it's a pure crawler, not an LLM app. 22,000+ GitHub stars, ISC licensed.

Best for: Creating custom GPTs from documentation sites, building RAG knowledge bases from web content Works with: OpenAI GPTs, Claude Projects, any RAG pipeline (LangChain, LlamaIndex)


Key Features

One-Command Crawl

Point at any URL with a glob pattern — get structured JSON output.

Smart Extraction

Extracts main content, strips navigation/ads/boilerplate. Clean text optimized for LLMs.

Configurable

  • maxPagesToCrawl — limit crawl depth
  • match — URL glob patterns to include/exclude
  • selector — CSS selector for content extraction
  • maxTokens — limit output size for GPT upload

Output Formats

JSON array of {title, url, text} objects — ready for:

  • OpenAI GPT Builder (upload as knowledge)
  • Claude Projects (upload as context)
  • Any RAG vector store ingestion

FAQ

Q: What is GPT Crawler? A: A tool that crawls any website and outputs structured JSON for creating custom GPTs and RAG knowledge bases. No AI cost — pure web crawling. 22K+ stars.

Q: How is it different from Crawl4AI or Firecrawl? A: GPT Crawler is simpler — focused specifically on generating GPT knowledge files. Crawl4AI and Firecrawl offer more features (JS rendering, structured extraction, APIs).


🙏

Fuente y agradecimientos

Created by Builder.io. Licensed under ISC. BuilderIO/gpt-crawler — 22,000+ GitHub stars

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados