Introduction
Kuzu is an embedded graph database built from the ground up for high-performance analytical queries on property graphs. Developed by researchers at the University of Waterloo, it uses columnar storage and vectorized execution to deliver fast multi-hop traversals and pattern matching while running as a library inside your application with no separate server process.
What Kuzu Does
- Stores and queries property graphs with node and relationship tables using Cypher
- Executes multi-hop recursive path queries efficiently with worst-case optimal join algorithms
- Provides an embeddable library for Python, Node.js, Rust, Java, and C/C++
- Supports structured property types including lists, maps, structs, and unions
- Imports data from CSV, Parquet, NumPy, Pandas, and Arrow sources
Architecture Overview
Kuzu uses a columnar storage layout optimized for graph workloads. The query processor implements factorized query execution and worst-case optimal join algorithms that avoid the intermediate result blowup common in traditional graph databases. A buffer manager and disk-based storage allow Kuzu to handle graphs larger than memory. The system compiles Cypher queries into vectorized physical plans for efficient CPU utilization.
Self-Hosting & Configuration
- Install via pip, npm, cargo, or Maven depending on your language
- Create a database by pointing to a directory path; files are managed automatically
- Import large graphs using the COPY FROM command with CSV or Parquet files
- Configure buffer pool size with the buffer_pool_size parameter for memory management
- Use the CLI shell (kuzu_shell) for interactive exploration and schema management
Key Features
- Cypher query language is familiar to users of Neo4j and other graph databases
- Worst-case optimal joins prevent performance cliffs on complex graph patterns
- Columnar and vectorized execution brings analytical database speed to graph queries
- Structured schema with typed node and relationship tables ensures data integrity
- Zero-copy integration with Apache Arrow and Pandas for data science workflows
Comparison with Similar Tools
- Neo4j — Neo4j is a client-server graph database; Kuzu is embeddable with columnar analytical performance
- DuckDB — DuckDB is an analytical SQL database; Kuzu is purpose-built for graph pattern matching and recursive queries
- CozoDB — CozoDB uses Datalog; Kuzu uses Cypher and worst-case optimal joins for graph-specific optimization
- SQLite — SQLite is a relational database; Kuzu handles graph traversals that would require complex recursive CTEs in SQL
- Amazon Neptune — Neptune is a managed cloud graph service; Kuzu is a free embeddable library with no infrastructure cost
FAQ
Q: Does Kuzu use Cypher or a custom query language? A: Kuzu uses the openCypher query language, the same language used by Neo4j, so existing Cypher knowledge transfers directly.
Q: Can Kuzu handle graphs that don't fit in memory? A: Yes. Kuzu uses disk-based storage with a buffer manager, so it can process graphs larger than available RAM.
Q: Is Kuzu suitable for transactional workloads? A: Kuzu supports ACID transactions, but it is optimized for analytical graph queries. For high-throughput OLTP, a server-based graph database may be more appropriate.
Q: Can I use Kuzu with GraphRAG or knowledge graph applications? A: Yes. Kuzu is well-suited for knowledge graph storage and retrieval, and integrates with LangChain and LlamaIndex for RAG pipelines.