Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsMay 17, 2026·3 min de lecture

Arroyo — Distributed Stream Processing Engine in Rust

A Rust-based distributed stream processing engine that lets you write SQL or Rust pipelines for real-time data transformation over Kafka, Kinesis, and other sources.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Arroyo Overview
Commande CLI universelle
npx tokrepo install f924d0f2-51a7-11f1-9bc6-00163e2b0d79

Introduction

Arroyo is an open-source distributed stream processing engine written in Rust. It enables developers to build real-time data pipelines using SQL or Rust, with sub-second latency and exactly-once semantics. It is designed for use cases like real-time analytics, feature engineering, and event-driven architectures.

What Arroyo Does

  • Processes streaming data with sub-second end-to-end latency
  • Supports SQL for pipeline definitions with windows, joins, and aggregations
  • Connects to Kafka, Kinesis, MQTT, WebSocket, and HTTP sources and sinks
  • Provides exactly-once processing guarantees via checkpointing
  • Scales horizontally across a cluster of workers

Architecture Overview

Arroyo compiles SQL or Rust pipelines into a distributed dataflow graph. A controller schedules tasks across workers, each running an async Rust runtime. State is managed with RocksDB and periodically checkpointed to S3 or local disk for fault tolerance. The Web UI allows visual pipeline creation and monitoring.

Self-Hosting & Configuration

  • Run via Docker: docker run ghcr.io/arroyosystems/arroyo:latest
  • Deploy on Kubernetes using the provided Helm chart for production
  • Configure sources and sinks through the Web UI or YAML connection profiles
  • Set checkpoint interval and state backend in the cluster configuration
  • Scale workers independently from the controller for elasticity

Key Features

  • SQL-first pipeline authoring with streaming extensions (windows, watermarks)
  • Sub-second latency with exactly-once state consistency
  • Built-in Web UI for pipeline creation, monitoring, and backpressure visualization
  • Rust-native performance with no JVM overhead or garbage collection pauses
  • Supports user-defined functions written in Rust for custom logic

Comparison with Similar Tools

  • Apache Flink — mature and feature-rich but requires JVM, heavier operational footprint
  • Apache Kafka Streams — library-based, tightly coupled to Kafka, no SQL layer
  • RisingWave — streaming SQL database with PostgreSQL compatibility, different deployment model
  • Materialize — SQL over streams with Postgres wire protocol, commercial focus
  • Benthos (Redpanda Connect) — config-driven stream processor, no SQL or stateful windowing

FAQ

Q: Can I use Arroyo without Kafka? A: Yes. Arroyo supports many sources including Kinesis, MQTT, WebSocket, HTTP, and file-based inputs.

Q: Does it support exactly-once semantics? A: Yes. Arroyo uses aligned checkpointing similar to Flink to guarantee exactly-once state processing.

Q: How does Arroyo compare to Flink on performance? A: Arroyo avoids JVM overhead and garbage collection, achieving lower tail latencies for many workloads with a smaller memory footprint.

Q: Can I write custom processing logic? A: Yes. User-defined functions (UDFs) can be written in Rust and loaded into SQL pipelines.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires