Introduction
NebulaGraph is a distributed, horizontally scalable graph database designed to store and query graphs with hundreds of billions of vertices and trillions of edges at sub-millisecond latency. Written in C++ with a shared-nothing architecture, it targets fraud detection, knowledge graphs, recommendation systems, and cybersecurity workloads.
What NebulaGraph Does
- Stores property graphs (tags, edges, and attributes) with ACID guarantees.
- Executes nGQL, a declarative graph language with Cypher-like syntax.
- Scales reads and writes linearly by sharding graph partitions across servers.
- Supports multi-graph isolation via "spaces" with independent schemas.
- Integrates with Spark, Flink, Kafka, and graph analytics libraries for large-scale processing.
Architecture Overview
Three services run independently: Meta (cluster metadata and schema), Graph (query parsing and planning), and Storage (sharded KV on RocksDB with Raft replication). Partitions are load-balanced; queries are parsed, optimized, and dispatched to relevant storage replicas, with edge index pushdown. A separate nebula-algorithm service runs large graph algorithms on Spark.
Self-Hosting & Configuration
- Install via RPM/DEB, Docker Compose, Helm chart
nebula-operator, or Kubernetes Operator. - A typical prod cluster runs 3 meta + 3+ graph + 3+ storage services with Raft quorum.
- Tune
wal_ttl,rocksdb_block_cache, andnum_io_threadsfor workload characteristics. - Enable authentication (
--enable_authorize=true), TLS, and RBAC innebula-*.conf. - Export with Nebula Exchange to/from Hive, Neo4j, ClickHouse, CSV, and Parquet.
Key Features
- Shared-nothing design lets you scale out by just adding storage nodes.
- nGQL + Cypher-compatible clauses lower the learning curve for Neo4j users.
- GeoSpatial types, full-text search via Elasticsearch, and built-in graph algorithms.
- Vertex/edge pushdown with RocksDB bloom filters makes multi-hop traversals fast.
- Visualization with NebulaGraph Studio and Explorer; Python / Java / Go SDKs.
Comparison with Similar Tools
- Neo4j — Single-writer leader design; Nebula scales horizontally with multi-shard writes.
- JanusGraph — Depends on Cassandra/HBase; Nebula ships its own distributed storage.
- Dgraph — GraphQL-native; Nebula chooses nGQL for more flexible graph traversals.
- TigerGraph — Proprietary; Nebula is Apache-2.0 open source.
- ArangoDB — Multi-model; Nebula specializes purely in graph for lower latency.
FAQ
Q: What is the largest graph it can handle? A: Production deployments store trillions of edges across 100+ storage nodes with horizontal sharding.
Q: Is nGQL compatible with Cypher? A: Nebula supports a subset of OpenCypher syntax, making migrations from Neo4j approachable.
Q: Can I run graph ML on it? A: Yes, via nebula-algorithm (GraphX/Spark) and integrations with DGL and PyTorch Geometric.
Q: How do I back up a cluster?
A: Use br (Nebula Backup & Restore) to snapshot meta + storage to S3, GCS, or local disk.