Configs2026年7月2日·1 分钟阅读

Apache Avro — Schema-Based Data Serialization System

Apache Avro is a compact binary serialization framework with rich schema support, schema evolution, and deep integration with the Hadoop and Kafka ecosystems.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Apache Avro Overview
直接安装命令
npx -y tokrepo@latest install 128005d7-75f1-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

Apache Avro is a data serialization system that uses JSON-defined schemas to produce compact binary data. It is the standard serialization format for Apache Kafka and is widely used throughout the Hadoop ecosystem for data storage, RPC, and schema evolution.

What Apache Avro Does

  • Serializes structured data into a compact binary format using JSON-defined schemas
  • Supports forward and backward schema evolution without breaking consumers
  • Provides code generation for Java, Python, C, C++, C#, and other languages
  • Includes an RPC framework for building schema-aware network services
  • Integrates natively with Kafka, Hadoop, Spark, Flink, and Hive

Architecture Overview

Avro schemas are defined in JSON and describe record types with named fields, each with a type. The binary encoding writes field values in schema-declared order without field tags, producing smaller payloads than tagged formats. A writer schema and reader schema are resolved at deserialization time, enabling schema evolution. Container files embed the writer schema in the file header so readers are always self-describing. The Schema Registry pattern (used with Kafka) stores schemas centrally and embeds only a schema ID in each message.

Self-Hosting & Configuration

  • Define schemas as JSON files with record types, fields, and types
  • Generate language-specific classes using the avro-tools CLI or Maven/Gradle plugin
  • Use GenericRecord for dynamic schema handling without code generation
  • Deploy a Schema Registry (like Confluent Schema Registry) alongside Kafka for centralized schema management
  • Configure compatibility rules (BACKWARD, FORWARD, FULL) to enforce safe evolution

Key Features

  • Compact binary format with no per-field tags reduces payload size
  • Schema evolution with backward and forward compatibility guarantees
  • Self-describing container files embed the schema for standalone use
  • Language-neutral: libraries exist for Java, Python, C, C++, C#, Ruby, and more
  • Standard serialization format for Apache Kafka and the Hadoop ecosystem

Comparison with Similar Tools

  • Protocol Buffers — uses field tags for evolution; Avro uses schema resolution and produces smaller payloads for many workloads
  • JSON — human-readable but verbose; Avro is binary and significantly more compact
  • MessagePack — schema-less binary; Avro enforces schemas for type safety and evolution
  • Thrift — includes RPC and transport; Avro focuses on serialization with simpler schema evolution
  • Parquet — columnar storage format; Avro is row-oriented and used for serialization and messaging

FAQ

Q: Why is Avro the default for Kafka? A: Avro combines compact binary encoding with schema evolution support. The Schema Registry pattern lets producers and consumers evolve independently while maintaining compatibility.

Q: How does schema evolution work? A: Writers and readers can use different schema versions. Fields can be added (with defaults) or removed without breaking existing consumers, as long as compatibility rules are followed.

Q: Do I need code generation to use Avro? A: No. Avro supports GenericRecord for dynamic usage without generated classes. Code generation is optional but provides type-safe access in statically typed languages.

Q: Can Avro schemas reference other schemas? A: Yes. Avro supports named types that can be referenced across schemas, and schemas can be composed using unions and nested records.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产