How GraphRAG Works
Traditional RAG vs GraphRAG
| Aspect | Traditional RAG | GraphRAG |
|---|---|---|
| Indexing | Chunk text → embed → vector store | Chunk text → extract entities/relations → build graph → detect communities → summarize |
| Retrieval | Top-K similar chunks | Graph traversal + community reports |
| Strengths | Fast, simple | Multi-hop reasoning, holistic understanding |
| Weakness | Misses cross-document connections | Higher indexing cost (LLM calls) |
Indexing Pipeline
Documents (PDF, TXT, CSV)
│
├─ 1. Text Chunking
│ Split into overlapping chunks
│
├─ 2. Entity & Relationship Extraction (LLM)
│ "Albert Einstein" ──[worked_at]──> "Princeton"
│ "Einstein" ──[developed]──> "General Relativity"
│
├─ 3. Knowledge Graph Construction
│ Nodes: entities with descriptions
│ Edges: relationships with weights
│
├─ 4. Community Detection (Leiden algorithm)
│ Group related entities into clusters
│
└─ 5. Community Summarization (LLM)
Generate report for each communityLocal Search
Answers questions about specific entities by combining:
- Entity descriptions from the knowledge graph
- Relationship context (neighboring entities)
- Relevant text chunks from source documents
- Community reports for broader context
# Example: "What did Einstein contribute to quantum mechanics?"
# GraphRAG traverses the graph from "Einstein" node,
# follows edges to "quantum mechanics", "photoelectric effect",
# and retrieves relevant source chunks + community summariesGlobal Search
Answers holistic questions using community reports in a map-reduce pattern:
- Map: Each community report answers the question independently
- Reduce: Responses are aggregated and synthesized into a final answer
# Example: "What are the major themes in this research corpus?"
# GraphRAG uses ALL community summaries to provide a comprehensive overview
# Traditional RAG would fail — no single chunk contains this informationConfiguration Options
# settings.yaml
llm:
type: openai_chat
model: gpt-4o
api_key: ${GRAPHRAG_API_KEY}
chunks:
size: 1200
overlap: 100
entity_extraction:
max_gleanings: 1 # Re-extraction passes for quality
community_reports:
max_length: 2000 # Summary length per community
snapshots:
graphml: true # Export graph for visualizationPerformance Benchmarks
From Microsoft Research evaluation:
- Comprehensiveness: GraphRAG wins 72-83% vs naive RAG on holistic queries
- Diversity of answers: GraphRAG wins 73-82% on breadth of response
- Specific entity queries: Local search comparable to traditional RAG
- Indexing cost: ~$5-15 per 1M tokens of input (depends on model)
FAQ
Q: What is GraphRAG? A: GraphRAG is Microsoft Research's open-source graph-based RAG system with 31,900+ GitHub stars. It extracts knowledge graphs from documents and uses graph traversal + community summaries for retrieval, enabling multi-hop reasoning that traditional vector RAG cannot achieve.
Q: When should I use GraphRAG instead of regular RAG? A: Use GraphRAG when your questions require reasoning across multiple documents, understanding relationships between entities, or summarizing themes across a corpus. For simple factual lookup from a single document, traditional RAG is faster and cheaper.
Q: Is GraphRAG free? A: Yes, fully open-source under MIT license. You pay for LLM API calls during indexing and querying. Indexing costs scale with corpus size.