HNSWlib — Fast Approximate Nearest Neighbor Search in C++ and Python

Introduction

HNSWlib is a C++ library implementing the Hierarchical Navigable Small World (HNSW) graph algorithm for approximate nearest neighbor search. It provides Python bindings and is widely used as the indexing engine behind vector databases and embedding-based search systems.

What HNSWlib Does

Builds an in-memory HNSW index for fast approximate nearest neighbor queries
Supports cosine similarity, inner product, and L2 distance metrics
Handles incremental insertions without full index rebuilds
Provides Python bindings with NumPy array support for easy integration
Saves and loads index state to disk for persistence across restarts

Architecture Overview

HNSW constructs a multi-layer proximity graph where each layer contains a subset of data points connected to their nearest neighbors. The top layers are sparse for fast long-range navigation, while the bottom layer is dense for precise local search. Queries start at the top layer and greedily navigate toward the nearest neighbor, descending through layers until the finest level. This hierarchical structure achieves logarithmic search complexity in practice.

Self-Hosting & Configuration

Install the Python package via pip or include the single header file in C++ projects
Set M (number of connections per element) and ef_construction (build-time search depth) during index creation
Tune ef (query-time search depth) to balance recall and speed at query time
Pre-allocate max_elements to avoid costly resizing operations
Serialize the index to a single binary file for fast loading on restart

Key Features

Sub-millisecond query times on million-scale datasets with high recall
Header-only C++ implementation with zero external dependencies
Thread-safe concurrent insertion and querying with fine-grained locking
Memory-efficient with configurable compression and connection limits
Foundation behind many vector databases including Chroma and Weaviate

Comparison with Similar Tools

FAISS — More feature-rich with GPU support and multiple index types; HNSWlib is simpler with competitive single-node performance
Annoy — Tree-based approach optimized for static datasets; HNSWlib supports dynamic insertions
ScaNN — Google's library with hardware-optimized quantization; HNSWlib is more portable
USearch — Newer alternative with wider language bindings; HNSWlib has a more established ecosystem

FAQ

Q: How many vectors can HNSWlib handle? A: It scales to tens of millions of vectors in-memory; exact limits depend on available RAM and vector dimensions.

Q: Can I add new vectors after building the index? A: Yes, HNSWlib supports incremental insertions. Deletions are supported via element marking and ID replacement.

Q: What recall can I expect? A: With well-tuned parameters, HNSWlib achieves 95-99% recall on standard benchmarks.

Q: Is GPU acceleration supported? A: No, HNSWlib is CPU-only. For GPU-accelerated search, consider FAISS or cuVS.

HNSWlib — Fast Approximate Nearest Neighbor Search in C++ and Python

Instalación lista para agent

Introduction

What HNSWlib Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discusión

Activos relacionados

Annoy — Approximate Nearest Neighbors by Spotify

DearPyGui — High-Performance Python GUI Framework with GPU Rendering

Polars — Blazingly Fast DataFrame Library in Rust

Pyodide — Python Runtime for the Browser via WebAssembly