Introduction
nano-graphrag is a lightweight, easy-to-modify reimplementation of Microsoft's GraphRAG approach. It extracts entities and relationships from documents to build a knowledge graph, then uses graph-based community detection and summarization to answer questions that require understanding connections across multiple documents.
What nano-graphrag Does
- Extracts named entities and their relationships from text using LLM-based extraction
- Builds a knowledge graph and runs community detection to identify topic clusters
- Generates community summaries that serve as a compressed representation of document themes
- Supports both local search (entity-focused) and global search (theme-focused) query modes
- Provides a simple Python API for inserting documents and querying the graph
Architecture Overview
The pipeline has three phases. First, documents are chunked and processed by an LLM to extract entity-relationship triples. These triples are stored in a graph (NetworkX by default, with optional Neo4j backend). Second, the Leiden community detection algorithm groups related entities into hierarchical communities, and an LLM generates summaries for each community. Third, at query time, the system retrieves relevant entities or community summaries based on the query type and feeds them as context to the LLM for answer generation. The entire process is designed to be readable and modifiable in under 1,000 lines of core code.
Self-Hosting & Configuration
- Install via pip with Python 3.10+; no external services required for the default setup
- Configure the LLM backend by passing model client parameters (supports OpenAI, Ollama, and custom endpoints)
- Swap the graph storage backend from in-memory NetworkX to Neo4j for larger datasets
- Adjust entity extraction prompts and community detection resolution in the configuration
- Embedding models and vector storage are configurable for hybrid retrieval approaches
Key Features
- Minimal, readable codebase designed for learning and customization
- Full GraphRAG pipeline: entity extraction, graph construction, community detection, and graph-aware retrieval
- Both local (specific entity) and global (broad theme) query modes
- Pluggable storage backends for graph, vector, and key-value data
- Incremental insertion allows adding documents to an existing knowledge graph without rebuilding
Comparison with Similar Tools
- Microsoft GraphRAG — the original reference implementation; nano-graphrag is simpler, faster to set up, and easier to customize
- LightRAG — another lightweight GraphRAG variant; nano-graphrag stays closer to the original paper's methodology
- LlamaIndex Knowledge Graph — graph-enhanced RAG within LlamaIndex; nano-graphrag is a standalone focused tool
- R2R — production RAG system with optional graph support; nano-graphrag is a learning-friendly, hackable implementation
- Neo4j GenAI — graph database with LLM integration; nano-graphrag provides the full extraction and query pipeline
FAQ
Q: How is this different from regular vector-based RAG? A: Vector RAG retrieves similar text chunks independently. GraphRAG extracts entities and relationships, builds a knowledge graph, and uses graph structure to answer questions that require connecting information across multiple documents.
Q: How large a corpus can nano-graphrag handle? A: With the default in-memory backend, it works well for hundreds of documents. For larger corpora, switch to the Neo4j backend for persistent, scalable graph storage.
Q: Which LLM providers are supported? A: OpenAI by default, with built-in support for Ollama and any OpenAI-compatible API endpoint. Custom LLM clients can be passed as parameters.
Q: Can I modify the entity extraction prompts? A: Yes. The extraction prompts are exposed as configurable templates, making it straightforward to adapt extraction for domain-specific terminology.