Introduction
R2R (Reason to Retrieve) is a production-grade retrieval-augmented generation framework built by SciPhi AI. It provides everything needed to go from raw documents to an agentic RAG pipeline with a single deployable service, removing the need to stitch together separate vector databases, embedding services, and LLM orchestration layers.
What R2R Does
- Ingests documents in 20+ formats (PDF, DOCX, HTML, Markdown, images with OCR) and chunks them automatically
- Runs hybrid search combining vector similarity and keyword matching with reciprocal rank fusion
- Builds and queries a knowledge graph alongside the vector index for multi-hop reasoning
- Exposes a full RESTful API for document management, search, RAG, and agent interactions
- Supports agentic RAG where the system plans retrieval strategies and iterates on answers
Architecture Overview
R2R runs as a containerized service with three main subsystems: an ingestion pipeline that parses, chunks, and embeds documents into PostgreSQL with pgvector; a retrieval engine that performs hybrid search and optional graph traversal; and an agentic orchestrator that chains retrieval, reasoning, and tool use. The system uses Hatchet for async task orchestration and exposes all functionality through a FastAPI-based REST interface. A Python SDK and CLI wrap the API for developer convenience.
Self-Hosting & Configuration
- Deploy with Docker Compose for a single-command setup including PostgreSQL, pgvector, and the R2R server
- Configure LLM and embedding providers via environment variables (supports OpenAI, Anthropic, local models)
- Customize chunking strategy, overlap, and embedding dimensions in the TOML config
- Enable the knowledge graph module by setting the graph provider configuration
- Scale horizontally by adding worker instances behind the task queue
Key Features
- End-to-end RAG in a single service: ingestion, embedding, search, generation, and agent orchestration
- Hybrid retrieval with vector search, full-text search, and knowledge graph traversal
- Multi-tenant architecture with user-level document permissions and access control
- Agentic RAG mode where the system autonomously decides when and how to retrieve
- Built-in evaluation endpoints for measuring retrieval and generation quality
Comparison with Similar Tools
- LangChain — general-purpose LLM framework requiring assembly; R2R is an integrated, deployable RAG service
- LlamaIndex — strong indexing library but needs external infrastructure; R2R bundles everything in one container
- Haystack — modular pipeline framework; R2R trades modularity for faster time-to-production
- RAGFlow — document-focused RAG engine; R2R adds agentic capabilities and knowledge graph support
- Verba — Weaviate-based RAG UI; R2R is backend-focused with a full API and more retrieval strategies
FAQ
Q: What database does R2R use? A: PostgreSQL with the pgvector extension for vector storage and optional graph storage. Everything runs in the provided Docker Compose stack.
Q: Can I use local models instead of OpenAI? A: Yes. R2R supports any OpenAI-compatible endpoint including Ollama, vLLM, and other local inference servers.
Q: How does the knowledge graph work? A: R2R extracts entities and relationships from ingested documents and stores them in a graph structure. During retrieval, the agent can traverse the graph for multi-hop reasoning alongside vector search.
Q: Is R2R suitable for production workloads? A: Yes. It includes authentication, multi-tenancy, async task processing, and horizontal scaling support.