Scripts2026年5月24日·1 分钟阅读

simdjson — Parsing Gigabytes of JSON per Second

A SIMD-accelerated JSON parser that processes structured data at the speed of your CPU, outperforming conventional parsers by 4-10x.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
simdjson Overview
通用 CLI 安装命令
npx tokrepo install 5c5d4293-578c-11f1-9bc6-00163e2b0d79

Introduction

simdjson is a C++ library that leverages SIMD (Single Instruction, Multiple Data) CPU instructions to parse JSON at multiple gigabytes per second. It was designed by Daniel Lemire and collaborators to prove that JSON parsing does not need to be a bottleneck in data-intensive applications.

What simdjson Does

  • Parses JSON documents using hardware-accelerated SIMD instructions on x86 and ARM
  • Provides an on-demand API that only materializes values you actually access
  • Validates UTF-8 encoding and JSON structure in a single pass
  • Handles documents up to 4 GB with minimal memory allocation
  • Supports JSON Pointer for targeted field extraction

Architecture Overview

simdjson operates in two stages. Stage 1 performs structural classification using SIMD to identify all brackets, braces, colons, and string boundaries in parallel. Stage 2 walks the resulting structural index to validate and extract values on demand, avoiding full tree construction unless the user requests it.

Self-Hosting & Configuration

  • Header-only or amalgamated single-file build: copy simdjson.h and simdjson.cpp into your project
  • CMake integration via FetchContent or find_package
  • Automatically detects best SIMD backend (haswell, westmere, arm64, fallback)
  • Compile with -O2 or higher for optimal vectorized codegen
  • Available through vcpkg, Conan, and system package managers

Key Features

  • Processes over 3 GB/s of JSON on modern hardware
  • On-demand parsing avoids allocating a full DOM tree
  • Fully validates documents per RFC 8259 including UTF-8
  • Thread-safe when each thread owns its own parser instance
  • Bindings available for Rust, Python, C#, Go, and Node.js

Comparison with Similar Tools

  • RapidJSON — fast DOM/SAX parser but 3-5x slower than simdjson on benchmarks
  • nlohmann/json — developer-friendly API but prioritizes ergonomics over raw speed
  • yyjson — C-based parser with competitive speed; simdjson typically leads on SIMD-heavy workloads
  • sajson — single-allocation parser; lacks on-demand mode and SIMD acceleration

FAQ

Q: Does simdjson require special hardware? A: It runs on any x86-64 or ARM64 processor. A scalar fallback exists for older CPUs, though at reduced speed.

Q: Can I use simdjson in a C project? A: The core is C++ but a C API wrapper exists, and the single-header build integrates easily into mixed-language projects.

Q: Is simdjson safe for untrusted input? A: Yes. It fully validates structure and encoding, returning errors on malformed documents without undefined behavior.

Q: How does the on-demand API differ from DOM parsing? A: On-demand iterates through the document lazily, only materializing the fields you access, reducing memory use significantly for large files.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产