Scripts2026年5月24日·1 分钟阅读

simdjson — Parsing Gigabytes of JSON per Second

A SIMD-accelerated JSON parser that processes structured data at the speed of your CPU, outperforming conventional parsers by 4-10x.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
simdjson Overview
直接安装命令
npx -y tokrepo@latest install 5c5d4293-578c-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

simdjson is a C++ library that leverages SIMD (Single Instruction, Multiple Data) CPU instructions to parse JSON at multiple gigabytes per second. It was designed by Daniel Lemire and collaborators to prove that JSON parsing does not need to be a bottleneck in data-intensive applications.

What simdjson Does

  • Parses JSON documents using hardware-accelerated SIMD instructions on x86 and ARM
  • Provides an on-demand API that only materializes values you actually access
  • Validates UTF-8 encoding and JSON structure in a single pass
  • Handles documents up to 4 GB with minimal memory allocation
  • Supports JSON Pointer for targeted field extraction

Architecture Overview

simdjson operates in two stages. Stage 1 performs structural classification using SIMD to identify all brackets, braces, colons, and string boundaries in parallel. Stage 2 walks the resulting structural index to validate and extract values on demand, avoiding full tree construction unless the user requests it.

Self-Hosting & Configuration

  • Header-only or amalgamated single-file build: copy simdjson.h and simdjson.cpp into your project
  • CMake integration via FetchContent or find_package
  • Automatically detects best SIMD backend (haswell, westmere, arm64, fallback)
  • Compile with -O2 or higher for optimal vectorized codegen
  • Available through vcpkg, Conan, and system package managers

Key Features

  • Processes over 3 GB/s of JSON on modern hardware
  • On-demand parsing avoids allocating a full DOM tree
  • Fully validates documents per RFC 8259 including UTF-8
  • Thread-safe when each thread owns its own parser instance
  • Bindings available for Rust, Python, C#, Go, and Node.js

Comparison with Similar Tools

  • RapidJSON — fast DOM/SAX parser but 3-5x slower than simdjson on benchmarks
  • nlohmann/json — developer-friendly API but prioritizes ergonomics over raw speed
  • yyjson — C-based parser with competitive speed; simdjson typically leads on SIMD-heavy workloads
  • sajson — single-allocation parser; lacks on-demand mode and SIMD acceleration

FAQ

Q: Does simdjson require special hardware? A: It runs on any x86-64 or ARM64 processor. A scalar fallback exists for older CPUs, though at reduced speed.

Q: Can I use simdjson in a C project? A: The core is C++ but a C API wrapper exists, and the single-header build integrates easily into mixed-language projects.

Q: Is simdjson safe for untrusted input? A: Yes. It fully validates structure and encoding, returning errors on malformed documents without undefined behavior.

Q: How does the on-demand API differ from DOM parsing? A: On-demand iterates through the document lazily, only materializing the fields you access, reducing memory use significantly for large files.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产