What is Arroyo — Distributed Stream Processing Engine in Rust?

A Rust-based distributed stream processing engine that lets you write SQL or Rust pipelines for real-time data transformation over Kafka, Kinesis, and other sources.

Is Arroyo — Distributed Stream Processing Engine in Rust free to use?

Yes. Arroyo — Distributed Stream Processing Engine in Rust is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Arroyo — Distributed Stream Processing Engine in Rust?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Arroyo — Distributed Stream Processing Engine in Rust

Introduction

Arroyo is an open-source distributed stream processing engine written in Rust. It enables developers to build real-time data pipelines using SQL or Rust, with sub-second latency and exactly-once semantics. It is designed for use cases like real-time analytics, feature engineering, and event-driven architectures.

What Arroyo Does

Processes streaming data with sub-second end-to-end latency
Supports SQL for pipeline definitions with windows, joins, and aggregations
Connects to Kafka, Kinesis, MQTT, WebSocket, and HTTP sources and sinks
Provides exactly-once processing guarantees via checkpointing
Scales horizontally across a cluster of workers

Architecture Overview

Arroyo compiles SQL or Rust pipelines into a distributed dataflow graph. A controller schedules tasks across workers, each running an async Rust runtime. State is managed with RocksDB and periodically checkpointed to S3 or local disk for fault tolerance. The Web UI allows visual pipeline creation and monitoring.

Self-Hosting & Configuration

Run via Docker: docker run ghcr.io/arroyosystems/arroyo:latest
Deploy on Kubernetes using the provided Helm chart for production
Configure sources and sinks through the Web UI or YAML connection profiles
Set checkpoint interval and state backend in the cluster configuration
Scale workers independently from the controller for elasticity

Key Features

SQL-first pipeline authoring with streaming extensions (windows, watermarks)
Sub-second latency with exactly-once state consistency
Built-in Web UI for pipeline creation, monitoring, and backpressure visualization
Rust-native performance with no JVM overhead or garbage collection pauses
Supports user-defined functions written in Rust for custom logic

Comparison with Similar Tools

Apache Flink — mature and feature-rich but requires JVM, heavier operational footprint
Apache Kafka Streams — library-based, tightly coupled to Kafka, no SQL layer
RisingWave — streaming SQL database with PostgreSQL compatibility, different deployment model
Materialize — SQL over streams with Postgres wire protocol, commercial focus
Benthos (Redpanda Connect) — config-driven stream processor, no SQL or stateful windowing

FAQ

Q: Can I use Arroyo without Kafka? A: Yes. Arroyo supports many sources including Kinesis, MQTT, WebSocket, HTTP, and file-based inputs.

Q: Does it support exactly-once semantics? A: Yes. Arroyo uses aligned checkpointing similar to Flink to guarantee exactly-once state processing.

Q: How does Arroyo compare to Flink on performance? A: Arroyo avoids JVM overhead and garbage collection, achieving lower tail latencies for many workloads with a smaller memory footprint.

Q: Can I write custom processing logic? A: Yes. User-defined functions (UDFs) can be written in Rust and loaded into SQL pipelines.

Arroyo — Distributed Stream Processing Engine in Rust

这个资产可以被 Agent 直接读取和安装

Introduction

What Arroyo Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

Hazelcast — Real-Time Distributed Computing Platform

Cadence — Distributed Workflow Execution Engine by Uber

Redpanda Connect (Benthos) — Declarative Stream Processing Engine

CrateDB — Distributed SQL Database for Machine Data