Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsJul 2, 2026·3 min de lectura

Apache Avro — Schema-Based Data Serialization System

Apache Avro is a compact binary serialization framework with rich schema support, schema evolution, and deep integration with the Hadoop and Kafka ecosystems.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Apache Avro Overview
Comando de instalación directa
npx -y tokrepo@latest install 128005d7-75f1-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

Apache Avro is a data serialization system that uses JSON-defined schemas to produce compact binary data. It is the standard serialization format for Apache Kafka and is widely used throughout the Hadoop ecosystem for data storage, RPC, and schema evolution.

What Apache Avro Does

  • Serializes structured data into a compact binary format using JSON-defined schemas
  • Supports forward and backward schema evolution without breaking consumers
  • Provides code generation for Java, Python, C, C++, C#, and other languages
  • Includes an RPC framework for building schema-aware network services
  • Integrates natively with Kafka, Hadoop, Spark, Flink, and Hive

Architecture Overview

Avro schemas are defined in JSON and describe record types with named fields, each with a type. The binary encoding writes field values in schema-declared order without field tags, producing smaller payloads than tagged formats. A writer schema and reader schema are resolved at deserialization time, enabling schema evolution. Container files embed the writer schema in the file header so readers are always self-describing. The Schema Registry pattern (used with Kafka) stores schemas centrally and embeds only a schema ID in each message.

Self-Hosting & Configuration

  • Define schemas as JSON files with record types, fields, and types
  • Generate language-specific classes using the avro-tools CLI or Maven/Gradle plugin
  • Use GenericRecord for dynamic schema handling without code generation
  • Deploy a Schema Registry (like Confluent Schema Registry) alongside Kafka for centralized schema management
  • Configure compatibility rules (BACKWARD, FORWARD, FULL) to enforce safe evolution

Key Features

  • Compact binary format with no per-field tags reduces payload size
  • Schema evolution with backward and forward compatibility guarantees
  • Self-describing container files embed the schema for standalone use
  • Language-neutral: libraries exist for Java, Python, C, C++, C#, Ruby, and more
  • Standard serialization format for Apache Kafka and the Hadoop ecosystem

Comparison with Similar Tools

  • Protocol Buffers — uses field tags for evolution; Avro uses schema resolution and produces smaller payloads for many workloads
  • JSON — human-readable but verbose; Avro is binary and significantly more compact
  • MessagePack — schema-less binary; Avro enforces schemas for type safety and evolution
  • Thrift — includes RPC and transport; Avro focuses on serialization with simpler schema evolution
  • Parquet — columnar storage format; Avro is row-oriented and used for serialization and messaging

FAQ

Q: Why is Avro the default for Kafka? A: Avro combines compact binary encoding with schema evolution support. The Schema Registry pattern lets producers and consumers evolve independently while maintaining compatibility.

Q: How does schema evolution work? A: Writers and readers can use different schema versions. Fields can be added (with defaults) or removed without breaking existing consumers, as long as compatibility rules are followed.

Q: Do I need code generation to use Avro? A: No. Avro supports GenericRecord for dynamic usage without generated classes. Code generation is optional but provides type-safe access in statically typed languages.

Q: Can Avro schemas reference other schemas? A: Yes. Avro supports named types that can be referenced across schemas, and schemas can be composed using unions and nested records.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados