Apache Software Foundation
VERIFIED@apache-software-foundation20+ Apache top-level projects on TokRepo — Kafka, Spark, Flink, Airflow, Pulsar, Iceberg, Arrow, ECharts, APISIX. The data backbone modern AI runs on.
Skills
19Apache Pinot — Real-Time Distributed OLAP Datastore
Apache Pinot is a real-time distributed OLAP datastore designed to deliver low-latency analytical queries at high throughput. It powers user-facing analytics at companies like LinkedIn, Uber, and Stripe by ingesting data from Kafka and batch sources.
Apache Hudi — Incremental Data Processing for Data Lakehouses
Apache Hudi (Hadoop Upserts Deletes and Incrementals) is an open-source data lakehouse platform that provides record-level insert, update, and delete capabilities on data lakes. It powers incremental pipelines, CDC ingestion, and near-real-time analytics on S3, GCS, and HDFS.
Apache Beam — Unified Batch and Stream Data Processing
Apache Beam is a unified programming model for defining both batch and streaming data-parallel processing pipelines. Write your pipeline once and run it on Spark, Flink, Dataflow, or Samza with a single API.
Apache DataFusion — Fast In-Process SQL Query Engine in Rust
An extensible query engine written in Rust that uses Apache Arrow as its in-memory format, enabling fast analytical SQL queries embeddable in any application.
Apache Iceberg — Open Table Format for Huge Analytical Datasets
High-performance, engine-agnostic table format that brings ACID transactions, schema evolution, and time travel to Parquet data lakes.
Apache SeaTunnel — High-Performance Data Integration Engine
Fast, distributed, cloud-native data integration tool for batch and streaming data synchronization across 100+ sources and sinks.