# NebulaGraph — Distributed Open-Source Graph Database > Horizontally scalable graph database for storing and querying billions of vertices and trillions of edges with sub-millisecond latency. ## Install Save in your project root: # NebulaGraph — Distributed Graph Database at Trillion-Edge Scale ## Quick Use ```bash # Docker Compose all-in-one for local dev git clone https://github.com/vesoft-inc/nebula-docker-compose cd nebula-docker-compose docker-compose up -d # Connect with the nebula console docker run --rm -ti --network nebula-docker-compose_nebula-net vesoft/nebula-console:v3 -addr graphd -port 9669 -u root -p nebula # nGQL — create space, schema, and insert CREATE SPACE demo (vid_type=FIXED_STRING(32)); USE demo; CREATE TAG player(name string, age int); INSERT VERTEX player(name, age) VALUES "p1":("Alice", 30); ``` ## Introduction NebulaGraph is a distributed, horizontally scalable graph database designed to store and query graphs with hundreds of billions of vertices and trillions of edges at sub-millisecond latency. Written in C++ with a shared-nothing architecture, it targets fraud detection, knowledge graphs, recommendation systems, and cybersecurity workloads. ## What NebulaGraph Does - Stores property graphs (tags, edges, and attributes) with ACID guarantees. - Executes nGQL, a declarative graph language with Cypher-like syntax. - Scales reads and writes linearly by sharding graph partitions across servers. - Supports multi-graph isolation via "spaces" with independent schemas. - Integrates with Spark, Flink, Kafka, and graph analytics libraries for large-scale processing. ## Architecture Overview Three services run independently: Meta (cluster metadata and schema), Graph (query parsing and planning), and Storage (sharded KV on RocksDB with Raft replication). Partitions are load-balanced; queries are parsed, optimized, and dispatched to relevant storage replicas, with edge index pushdown. A separate nebula-algorithm service runs large graph algorithms on Spark. ## Self-Hosting & Configuration - Install via RPM/DEB, Docker Compose, Helm chart `nebula-operator`, or Kubernetes Operator. - A typical prod cluster runs 3 meta + 3+ graph + 3+ storage services with Raft quorum. - Tune `wal_ttl`, `rocksdb_block_cache`, and `num_io_threads` for workload characteristics. - Enable authentication (`--enable_authorize=true`), TLS, and RBAC in `nebula-*.conf`. - Export with Nebula Exchange to/from Hive, Neo4j, ClickHouse, CSV, and Parquet. ## Key Features - Shared-nothing design lets you scale out by just adding storage nodes. - nGQL + Cypher-compatible clauses lower the learning curve for Neo4j users. - GeoSpatial types, full-text search via Elasticsearch, and built-in graph algorithms. - Vertex/edge pushdown with RocksDB bloom filters makes multi-hop traversals fast. - Visualization with NebulaGraph Studio and Explorer; Python / Java / Go SDKs. ## Comparison with Similar Tools - **Neo4j** — Single-writer leader design; Nebula scales horizontally with multi-shard writes. - **JanusGraph** — Depends on Cassandra/HBase; Nebula ships its own distributed storage. - **Dgraph** — GraphQL-native; Nebula chooses nGQL for more flexible graph traversals. - **TigerGraph** — Proprietary; Nebula is Apache-2.0 open source. - **ArangoDB** — Multi-model; Nebula specializes purely in graph for lower latency. ## FAQ **Q:** What is the largest graph it can handle? A: Production deployments store trillions of edges across 100+ storage nodes with horizontal sharding. **Q:** Is nGQL compatible with Cypher? A: Nebula supports a subset of OpenCypher syntax, making migrations from Neo4j approachable. **Q:** Can I run graph ML on it? A: Yes, via nebula-algorithm (GraphX/Spark) and integrations with DGL and PyTorch Geometric. **Q:** How do I back up a cluster? A: Use `br` (Nebula Backup & Restore) to snapshot meta + storage to S3, GCS, or local disk. ## Sources - https://github.com/vesoft-inc/nebula - https://docs.nebula-graph.io --- Source: https://tokrepo.com/en/workflows/9e58f35f-3931-11f1-9bc6-00163e2b0d79 Author: AI Open Source