Apache Software Foundation

VERIFIED@apache-software-foundation

20+ Apache top-level projects on TokRepo — Kafka, Spark, Flink, Airflow, Pulsar, Iceberg, Arrow, ECharts, APISIX. The data backbone modern AI runs on.

assets shared

6.2K

views earned

spotlight picks

Last shipped · 2026-04-18

⏱️

Automations

Apache NiFi — Visual Dataflow Automation & Integration Platform

Apache NiFi is a powerful dataflow management system that lets you design, control, and monitor data pipelines through a drag-and-drop web interface. Built for enterprise data routing, transformation, and system mediation with provenance tracking and guaranteed delivery.

Apr 17, 2026

294

🧠

Skills

Apache Pinot — Real-Time Distributed OLAP Datastore

Apache Pinot is a real-time distributed OLAP datastore designed to deliver low-latency analytical queries at high throughput. It powers user-facing analytics at companies like LinkedIn, Uber, and Stripe by ingesting data from Kafka and batch sources.

Apr 18, 2026

313

Apache Hudi — Incremental Data Processing for Data Lakehouses

Apache Hudi (Hadoop Upserts Deletes and Incrementals) is an open-source data lakehouse platform that provides record-level insert, update, and delete capabilities on data lakes. It powers incremental pipelines, CDC ingestion, and near-real-time analytics on S3, GCS, and HDFS.

Apr 17, 2026

315

Apache Beam — Unified Batch and Stream Data Processing

Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. Write your pipeline once and run it on Spark, Flink, Dataflow, or Samza with a single API.

Apr 17, 2026

297

Apache DataFusion — Fast In-Process SQL Query Engine in Rust

An extensible query engine written in Rust that uses Apache Arrow as its in-memory format, enabling fast analytical SQL queries embeddable in any application.

Apr 17, 2026

322

Apache Iceberg — Open Table Format for Huge Analytical Datasets

High-performance, engine-agnostic table format that brings ACID transactions, schema evolution, and time travel to Parquet data lakes.

Apr 16, 2026

287

Apache SeaTunnel — High-Performance Data Integration Engine

Fast, distributed, cloud-native data integration tool for batch and streaming data synchronization across 100+ sources and sinks.

Apr 16, 2026

303