Configs2026年5月23日·1 分钟阅读

Apache Storm — Distributed Real-Time Stream Processing Engine

Apache Storm is a distributed real-time computation system for processing unbounded streams of data with guaranteed message processing and sub-second latency.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Needs Confirmation · 64/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Apache Storm Overview
通用 CLI 安装命令
npx tokrepo install 6c1d4873-56a2-11f1-9bc6-00163e2b0d79

Introduction

Apache Storm is a distributed real-time computation system for processing unbounded streams of data. Originally created at BackType (acquired by Twitter), Storm provides guaranteed message processing, horizontal scalability, and fault tolerance for applications that require low-latency analytics, continuous computation, and real-time ETL.

What Apache Storm Does

  • Processes millions of tuples per second per node with sub-second processing latency
  • Guarantees at-least-once or exactly-once message processing semantics via Trident
  • Distributes computation across a cluster with automatic task reassignment on failures
  • Supports multiple programming languages through its multi-language protocol (Python, Ruby, JavaScript)
  • Integrates with Kafka, HDFS, HBase, Redis, Cassandra, and other data systems via connectors

Architecture Overview

Storm topologies consist of spouts (data sources) and bolts (processing units) connected in a directed acyclic graph. Nimbus (the master daemon) distributes topology code across the cluster and assigns tasks to Supervisors, which spawn worker processes on each node. ZooKeeper coordinates state between Nimbus and Supervisors. Tuples flow through the topology, and Storm's acker mechanism tracks the completion of each tuple tree to provide reliability guarantees.

Self-Hosting & Configuration

  • Requires Java 11+, ZooKeeper 3.5+, and Python 3 for the multi-language protocol
  • Configure storm.yaml with nimbus.seeds, supervisor.slots.ports, and storm.zookeeper.servers
  • Set worker heap size and parallelism hints based on workload and available cluster resources
  • Deploy topologies via storm jar and manage them through the Storm UI on port 8080
  • Enable Kerberos authentication and SSL for production cluster security

Key Features

  • Horizontal scalability with dynamic rebalancing of topology parallelism
  • Guaranteed message processing with configurable at-least-once or exactly-once semantics
  • Trident API provides high-level abstractions for stateful stream processing and micro-batching
  • Multi-language support allows writing spouts and bolts in Python, Ruby, or any language
  • Fault tolerant with automatic worker restart and task reassignment on node failures

Comparison with Similar Tools

  • Apache Flink — modern stream processor with event-time semantics and exactly-once by default; Storm is simpler but less feature-rich for stateful processing
  • Apache Kafka Streams — library-based stream processing tied to Kafka; Storm is a standalone cluster with broader source support
  • Apache Spark Streaming — micro-batch approach with higher latency; Storm provides true per-tuple processing
  • Apache Samza — stream processor integrated with Kafka and YARN; Storm uses its own resource management
  • Amazon Kinesis Data Analytics — managed streaming service on AWS; Storm is self-hosted and vendor-neutral

FAQ

Q: Is Apache Storm still actively maintained? A: Yes. Storm continues to receive releases under the Apache Software Foundation, though Flink and Kafka Streams have become more popular for new deployments.

Q: What is the Trident API? A: Trident is a high-level abstraction on top of Storm that provides exactly-once processing, stateful operations, and micro-batching for use cases that need stronger consistency guarantees.

Q: How does Storm handle backpressure? A: Storm implements backpressure by monitoring executor queue sizes. When a bolt falls behind, upstream spouts are throttled to prevent memory exhaustion.

Q: Can Storm process data from Kafka? A: Yes. The storm-kafka-client module provides a KafkaSpout for consuming Kafka topics with configurable offset management and partition assignment.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产