Build knowledge graphs from documents for smarter RAG. Local and global search over entity relationships. By Microsoft Research. 31K+ stars.
TO
TokRepo精选 · Community
Quick Use
Use it first, then decide how deep to go
This block should tell both the user and the agent what to copy, install, and apply first.
```bash
pip install graphrag
```
```bash
# Initialize a project
graphrag init --root ./my-project
# Index your documents (builds the knowledge graph)
graphrag index --root ./my-project
# Query with local search (entity-focused)
graphrag query --root ./my-project --method local \
"What are the main findings about transformer architectures?"
# Query with global search (holistic summary)
graphrag query --root ./my-project --method global \
"What are the key themes across all documents?"
```
Configure your LLM provider in `settings.yaml`:
```yaml
llm:
type: openai_chat
model: gpt-4o
api_key: ${GRAPHRAG_API_KEY}
```
---
Intro
GraphRAG is a modular graph-based Retrieval-Augmented Generation system by Microsoft Research with 31,900+ GitHub stars. Unlike traditional RAG that simply retrieves text chunks by vector similarity, GraphRAG first extracts a structured knowledge graph from your documents — entities, relationships, and community structures — then uses this graph to answer questions with deeper reasoning. It offers two search modes: **local search** for questions about specific entities, and **global search** for holistic questions about the entire corpus. Research shows GraphRAG substantially outperforms naive RAG on complex reasoning tasks.
Works with: OpenAI GPT-4, Anthropic Claude, Azure OpenAI, any OpenAI-compatible API. Best for teams building RAG over large document collections that need multi-hop reasoning. Setup time: under 10 minutes.
---
## How GraphRAG Works
### Traditional RAG vs GraphRAG
| Aspect | Traditional RAG | GraphRAG |
|--------|----------------|----------|
| **Indexing** | Chunk text → embed → vector store | Chunk text → extract entities/relations → build graph → detect communities → summarize |
| **Retrieval** | Top-K similar chunks | Graph traversal + community reports |
| **Strengths** | Fast, simple | Multi-hop reasoning, holistic understanding |
| **Weakness** | Misses cross-document connections | Higher indexing cost (LLM calls) |
### Indexing Pipeline
```
Documents (PDF, TXT, CSV)
│
├─ 1. Text Chunking
│ Split into overlapping chunks
│
├─ 2. Entity & Relationship Extraction (LLM)
│ "Albert Einstein" ──[worked_at]──> "Princeton"
│ "Einstein" ──[developed]──> "General Relativity"
│
├─ 3. Knowledge Graph Construction
│ Nodes: entities with descriptions
│ Edges: relationships with weights
│
├─ 4. Community Detection (Leiden algorithm)
│ Group related entities into clusters
│
└─ 5. Community Summarization (LLM)
Generate report for each community
```
### Local Search
Answers questions about specific entities by combining:
- Entity descriptions from the knowledge graph
- Relationship context (neighboring entities)
- Relevant text chunks from source documents
- Community reports for broader context
```python
# Example: "What did Einstein contribute to quantum mechanics?"
# GraphRAG traverses the graph from "Einstein" node,
# follows edges to "quantum mechanics", "photoelectric effect",
# and retrieves relevant source chunks + community summaries
```
### Global Search
Answers holistic questions using community reports in a map-reduce pattern:
1. **Map**: Each community report answers the question independently
2. **Reduce**: Responses are aggregated and synthesized into a final answer
```python
# Example: "What are the major themes in this research corpus?"
# GraphRAG uses ALL community summaries to provide a comprehensive overview
# Traditional RAG would fail — no single chunk contains this information
```
### Configuration Options
```yaml
# settings.yaml
llm:
type: openai_chat
model: gpt-4o
api_key: ${GRAPHRAG_API_KEY}
chunks:
size: 1200
overlap: 100
entity_extraction:
max_gleanings: 1 # Re-extraction passes for quality
community_reports:
max_length: 2000 # Summary length per community
snapshots:
graphml: true # Export graph for visualization
```
### Performance Benchmarks
From Microsoft Research evaluation:
- **Comprehensiveness**: GraphRAG wins 72-83% vs naive RAG on holistic queries
- **Diversity of answers**: GraphRAG wins 73-82% on breadth of response
- **Specific entity queries**: Local search comparable to traditional RAG
- **Indexing cost**: ~$5-15 per 1M tokens of input (depends on model)
---
## FAQ
**Q: What is GraphRAG?**
A: GraphRAG is Microsoft Research's open-source graph-based RAG system with 31,900+ GitHub stars. It extracts knowledge graphs from documents and uses graph traversal + community summaries for retrieval, enabling multi-hop reasoning that traditional vector RAG cannot achieve.
**Q: When should I use GraphRAG instead of regular RAG?**
A: Use GraphRAG when your questions require reasoning across multiple documents, understanding relationships between entities, or summarizing themes across a corpus. For simple factual lookup from a single document, traditional RAG is faster and cheaper.
**Q: Is GraphRAG free?**
A: Yes, fully open-source under MIT license. You pay for LLM API calls during indexing and querying. Indexing costs scale with corpus size.
---
🙏
Source & Thanks
> Created by [Microsoft Research](https://github.com/microsoft). Licensed under MIT.
>
> [graphrag](https://github.com/microsoft/graphrag) — ⭐ 31,900+
Thanks to Microsoft Research for advancing RAG with knowledge graph techniques.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.