Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 24, 2026·3 min de lecture

simdjson — Parsing Gigabytes of JSON per Second

A SIMD-accelerated JSON parser that processes structured data at the speed of your CPU, outperforming conventional parsers by 4-10x.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
simdjson Overview
Commande CLI universelle
npx tokrepo install 5c5d4293-578c-11f1-9bc6-00163e2b0d79

Introduction

simdjson is a C++ library that leverages SIMD (Single Instruction, Multiple Data) CPU instructions to parse JSON at multiple gigabytes per second. It was designed by Daniel Lemire and collaborators to prove that JSON parsing does not need to be a bottleneck in data-intensive applications.

What simdjson Does

  • Parses JSON documents using hardware-accelerated SIMD instructions on x86 and ARM
  • Provides an on-demand API that only materializes values you actually access
  • Validates UTF-8 encoding and JSON structure in a single pass
  • Handles documents up to 4 GB with minimal memory allocation
  • Supports JSON Pointer for targeted field extraction

Architecture Overview

simdjson operates in two stages. Stage 1 performs structural classification using SIMD to identify all brackets, braces, colons, and string boundaries in parallel. Stage 2 walks the resulting structural index to validate and extract values on demand, avoiding full tree construction unless the user requests it.

Self-Hosting & Configuration

  • Header-only or amalgamated single-file build: copy simdjson.h and simdjson.cpp into your project
  • CMake integration via FetchContent or find_package
  • Automatically detects best SIMD backend (haswell, westmere, arm64, fallback)
  • Compile with -O2 or higher for optimal vectorized codegen
  • Available through vcpkg, Conan, and system package managers

Key Features

  • Processes over 3 GB/s of JSON on modern hardware
  • On-demand parsing avoids allocating a full DOM tree
  • Fully validates documents per RFC 8259 including UTF-8
  • Thread-safe when each thread owns its own parser instance
  • Bindings available for Rust, Python, C#, Go, and Node.js

Comparison with Similar Tools

  • RapidJSON — fast DOM/SAX parser but 3-5x slower than simdjson on benchmarks
  • nlohmann/json — developer-friendly API but prioritizes ergonomics over raw speed
  • yyjson — C-based parser with competitive speed; simdjson typically leads on SIMD-heavy workloads
  • sajson — single-allocation parser; lacks on-demand mode and SIMD acceleration

FAQ

Q: Does simdjson require special hardware? A: It runs on any x86-64 or ARM64 processor. A scalar fallback exists for older CPUs, though at reduced speed.

Q: Can I use simdjson in a C project? A: The core is C++ but a C API wrapper exists, and the single-header build integrates easily into mixed-language projects.

Q: Is simdjson safe for untrusted input? A: Yes. It fully validates structure and encoding, returning errors on malformed documents without undefined behavior.

Q: How does the on-demand API differ from DOM parsing? A: On-demand iterates through the document lazily, only materializing the fields you access, reducing memory use significantly for large files.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires