Configs2026年5月21日·1 分钟阅读

nano-graphrag — Lightweight GraphRAG Implementation

A simple, hackable implementation of Microsoft GraphRAG that builds knowledge graphs from documents and uses graph-based retrieval for more accurate LLM question answering.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
nano-graphrag
通用 CLI 安装命令
npx tokrepo install ce24ffd1-5530-11f1-9bc6-00163e2b0d79

Introduction

nano-graphrag is a lightweight, easy-to-modify reimplementation of Microsoft's GraphRAG approach. It extracts entities and relationships from documents to build a knowledge graph, then uses graph-based community detection and summarization to answer questions that require understanding connections across multiple documents.

What nano-graphrag Does

  • Extracts named entities and their relationships from text using LLM-based extraction
  • Builds a knowledge graph and runs community detection to identify topic clusters
  • Generates community summaries that serve as a compressed representation of document themes
  • Supports both local search (entity-focused) and global search (theme-focused) query modes
  • Provides a simple Python API for inserting documents and querying the graph

Architecture Overview

The pipeline has three phases. First, documents are chunked and processed by an LLM to extract entity-relationship triples. These triples are stored in a graph (NetworkX by default, with optional Neo4j backend). Second, the Leiden community detection algorithm groups related entities into hierarchical communities, and an LLM generates summaries for each community. Third, at query time, the system retrieves relevant entities or community summaries based on the query type and feeds them as context to the LLM for answer generation. The entire process is designed to be readable and modifiable in under 1,000 lines of core code.

Self-Hosting & Configuration

  • Install via pip with Python 3.10+; no external services required for the default setup
  • Configure the LLM backend by passing model client parameters (supports OpenAI, Ollama, and custom endpoints)
  • Swap the graph storage backend from in-memory NetworkX to Neo4j for larger datasets
  • Adjust entity extraction prompts and community detection resolution in the configuration
  • Embedding models and vector storage are configurable for hybrid retrieval approaches

Key Features

  • Minimal, readable codebase designed for learning and customization
  • Full GraphRAG pipeline: entity extraction, graph construction, community detection, and graph-aware retrieval
  • Both local (specific entity) and global (broad theme) query modes
  • Pluggable storage backends for graph, vector, and key-value data
  • Incremental insertion allows adding documents to an existing knowledge graph without rebuilding

Comparison with Similar Tools

  • Microsoft GraphRAG — the original reference implementation; nano-graphrag is simpler, faster to set up, and easier to customize
  • LightRAG — another lightweight GraphRAG variant; nano-graphrag stays closer to the original paper's methodology
  • LlamaIndex Knowledge Graph — graph-enhanced RAG within LlamaIndex; nano-graphrag is a standalone focused tool
  • R2R — production RAG system with optional graph support; nano-graphrag is a learning-friendly, hackable implementation
  • Neo4j GenAI — graph database with LLM integration; nano-graphrag provides the full extraction and query pipeline

FAQ

Q: How is this different from regular vector-based RAG? A: Vector RAG retrieves similar text chunks independently. GraphRAG extracts entities and relationships, builds a knowledge graph, and uses graph structure to answer questions that require connecting information across multiple documents.

Q: How large a corpus can nano-graphrag handle? A: With the default in-memory backend, it works well for hundreds of documents. For larger corpora, switch to the Neo4j backend for persistent, scalable graph storage.

Q: Which LLM providers are supported? A: OpenAI by default, with built-in support for Ollama and any OpenAI-compatible API endpoint. Custom LLM clients can be passed as parameters.

Q: Can I modify the entity extraction prompts? A: Yes. The extraction prompts are exposed as configurable templates, making it straightforward to adapt extraction for domain-specific terminology.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产