Configs2026年4月16日·1 分钟阅读

NebulaGraph — Distributed Open-Source Graph Database

Horizontally scalable graph database for storing and querying billions of vertices and trillions of edges with sub-millisecond latency.

Introduction

NebulaGraph is a distributed, horizontally scalable graph database designed to store and query graphs with hundreds of billions of vertices and trillions of edges at sub-millisecond latency. Written in C++ with a shared-nothing architecture, it targets fraud detection, knowledge graphs, recommendation systems, and cybersecurity workloads.

What NebulaGraph Does

  • Stores property graphs (tags, edges, and attributes) with ACID guarantees.
  • Executes nGQL, a declarative graph language with Cypher-like syntax.
  • Scales reads and writes linearly by sharding graph partitions across servers.
  • Supports multi-graph isolation via "spaces" with independent schemas.
  • Integrates with Spark, Flink, Kafka, and graph analytics libraries for large-scale processing.

Architecture Overview

Three services run independently: Meta (cluster metadata and schema), Graph (query parsing and planning), and Storage (sharded KV on RocksDB with Raft replication). Partitions are load-balanced; queries are parsed, optimized, and dispatched to relevant storage replicas, with edge index pushdown. A separate nebula-algorithm service runs large graph algorithms on Spark.

Self-Hosting & Configuration

  • Install via RPM/DEB, Docker Compose, Helm chart nebula-operator, or Kubernetes Operator.
  • A typical prod cluster runs 3 meta + 3+ graph + 3+ storage services with Raft quorum.
  • Tune wal_ttl, rocksdb_block_cache, and num_io_threads for workload characteristics.
  • Enable authentication (--enable_authorize=true), TLS, and RBAC in nebula-*.conf.
  • Export with Nebula Exchange to/from Hive, Neo4j, ClickHouse, CSV, and Parquet.

Key Features

  • Shared-nothing design lets you scale out by just adding storage nodes.
  • nGQL + Cypher-compatible clauses lower the learning curve for Neo4j users.
  • GeoSpatial types, full-text search via Elasticsearch, and built-in graph algorithms.
  • Vertex/edge pushdown with RocksDB bloom filters makes multi-hop traversals fast.
  • Visualization with NebulaGraph Studio and Explorer; Python / Java / Go SDKs.

Comparison with Similar Tools

  • Neo4j — Single-writer leader design; Nebula scales horizontally with multi-shard writes.
  • JanusGraph — Depends on Cassandra/HBase; Nebula ships its own distributed storage.
  • Dgraph — GraphQL-native; Nebula chooses nGQL for more flexible graph traversals.
  • TigerGraph — Proprietary; Nebula is Apache-2.0 open source.
  • ArangoDB — Multi-model; Nebula specializes purely in graph for lower latency.

FAQ

Q: What is the largest graph it can handle? A: Production deployments store trillions of edges across 100+ storage nodes with horizontal sharding.

Q: Is nGQL compatible with Cypher? A: Nebula supports a subset of OpenCypher syntax, making migrations from Neo4j approachable.

Q: Can I run graph ML on it? A: Yes, via nebula-algorithm (GraphX/Spark) and integrations with DGL and PyTorch Geometric.

Q: How do I back up a cluster? A: Use br (Nebula Backup & Restore) to snapshot meta + storage to S3, GCS, or local disk.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产