Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 24, 2026·3 min de lectura

simdjson — Parsing Gigabytes of JSON per Second

A SIMD-accelerated JSON parser that processes structured data at the speed of your CPU, outperforming conventional parsers by 4-10x.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
simdjson Overview
Comando CLI universal
npx tokrepo install 5c5d4293-578c-11f1-9bc6-00163e2b0d79

Introduction

simdjson is a C++ library that leverages SIMD (Single Instruction, Multiple Data) CPU instructions to parse JSON at multiple gigabytes per second. It was designed by Daniel Lemire and collaborators to prove that JSON parsing does not need to be a bottleneck in data-intensive applications.

What simdjson Does

  • Parses JSON documents using hardware-accelerated SIMD instructions on x86 and ARM
  • Provides an on-demand API that only materializes values you actually access
  • Validates UTF-8 encoding and JSON structure in a single pass
  • Handles documents up to 4 GB with minimal memory allocation
  • Supports JSON Pointer for targeted field extraction

Architecture Overview

simdjson operates in two stages. Stage 1 performs structural classification using SIMD to identify all brackets, braces, colons, and string boundaries in parallel. Stage 2 walks the resulting structural index to validate and extract values on demand, avoiding full tree construction unless the user requests it.

Self-Hosting & Configuration

  • Header-only or amalgamated single-file build: copy simdjson.h and simdjson.cpp into your project
  • CMake integration via FetchContent or find_package
  • Automatically detects best SIMD backend (haswell, westmere, arm64, fallback)
  • Compile with -O2 or higher for optimal vectorized codegen
  • Available through vcpkg, Conan, and system package managers

Key Features

  • Processes over 3 GB/s of JSON on modern hardware
  • On-demand parsing avoids allocating a full DOM tree
  • Fully validates documents per RFC 8259 including UTF-8
  • Thread-safe when each thread owns its own parser instance
  • Bindings available for Rust, Python, C#, Go, and Node.js

Comparison with Similar Tools

  • RapidJSON — fast DOM/SAX parser but 3-5x slower than simdjson on benchmarks
  • nlohmann/json — developer-friendly API but prioritizes ergonomics over raw speed
  • yyjson — C-based parser with competitive speed; simdjson typically leads on SIMD-heavy workloads
  • sajson — single-allocation parser; lacks on-demand mode and SIMD acceleration

FAQ

Q: Does simdjson require special hardware? A: It runs on any x86-64 or ARM64 processor. A scalar fallback exists for older CPUs, though at reduced speed.

Q: Can I use simdjson in a C project? A: The core is C++ but a C API wrapper exists, and the single-header build integrates easily into mixed-language projects.

Q: Is simdjson safe for untrusted input? A: Yes. It fully validates structure and encoding, returning errors on malformed documents without undefined behavior.

Q: How does the on-demand API differ from DOM parsing? A: On-demand iterates through the document lazily, only materializing the fields you access, reducing memory use significantly for large files.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados