Introduction
simdjson is a C++ library that leverages SIMD (Single Instruction, Multiple Data) CPU instructions to parse JSON at multiple gigabytes per second. It was designed by Daniel Lemire and collaborators to prove that JSON parsing does not need to be a bottleneck in data-intensive applications.
What simdjson Does
- Parses JSON documents using hardware-accelerated SIMD instructions on x86 and ARM
- Provides an on-demand API that only materializes values you actually access
- Validates UTF-8 encoding and JSON structure in a single pass
- Handles documents up to 4 GB with minimal memory allocation
- Supports JSON Pointer for targeted field extraction
Architecture Overview
simdjson operates in two stages. Stage 1 performs structural classification using SIMD to identify all brackets, braces, colons, and string boundaries in parallel. Stage 2 walks the resulting structural index to validate and extract values on demand, avoiding full tree construction unless the user requests it.
Self-Hosting & Configuration
- Header-only or amalgamated single-file build: copy simdjson.h and simdjson.cpp into your project
- CMake integration via FetchContent or find_package
- Automatically detects best SIMD backend (haswell, westmere, arm64, fallback)
- Compile with -O2 or higher for optimal vectorized codegen
- Available through vcpkg, Conan, and system package managers
Key Features
- Processes over 3 GB/s of JSON on modern hardware
- On-demand parsing avoids allocating a full DOM tree
- Fully validates documents per RFC 8259 including UTF-8
- Thread-safe when each thread owns its own parser instance
- Bindings available for Rust, Python, C#, Go, and Node.js
Comparison with Similar Tools
- RapidJSON — fast DOM/SAX parser but 3-5x slower than simdjson on benchmarks
- nlohmann/json — developer-friendly API but prioritizes ergonomics over raw speed
- yyjson — C-based parser with competitive speed; simdjson typically leads on SIMD-heavy workloads
- sajson — single-allocation parser; lacks on-demand mode and SIMD acceleration
FAQ
Q: Does simdjson require special hardware? A: It runs on any x86-64 or ARM64 processor. A scalar fallback exists for older CPUs, though at reduced speed.
Q: Can I use simdjson in a C project? A: The core is C++ but a C API wrapper exists, and the single-header build integrates easily into mixed-language projects.
Q: Is simdjson safe for untrusted input? A: Yes. It fully validates structure and encoding, returning errors on malformed documents without undefined behavior.
Q: How does the on-demand API differ from DOM parsing? A: On-demand iterates through the document lazily, only materializing the fields you access, reducing memory use significantly for large files.