Scripts2026年5月23日·1 分钟阅读

Apache DolphinScheduler — Distributed Data Workflow Orchestration Platform

Apache DolphinScheduler is a cloud-native workflow orchestration platform with a visual DAG editor, multi-tenant support, and distributed task execution for data pipelines.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Apache DolphinScheduler Overview
通用 CLI 安装命令
npx tokrepo install bb2a3f0d-56a1-11f1-9bc6-00163e2b0d79

Introduction

Apache DolphinScheduler is a distributed, cloud-native workflow orchestration platform designed for data pipeline scheduling. It provides a drag-and-drop visual DAG editor, multi-tenant isolation, and built-in support for dozens of task types including Shell, SQL, Spark, Flink, and Python, making it a strong choice for data engineering teams managing complex ETL workflows.

What Apache DolphinScheduler Does

  • Orchestrates complex data workflows as directed acyclic graphs (DAGs) with visual editing
  • Schedules tasks with cron expressions, manual triggers, and event-driven dependencies
  • Supports 30+ task types including Shell, Python, SQL, Spark, Flink, MapReduce, and HTTP
  • Provides multi-tenant resource isolation with role-based access control
  • Monitors workflow execution with real-time logs, alerts, and retry mechanisms

Architecture Overview

DolphinScheduler uses a master-worker architecture. MasterServer handles DAG parsing, task scheduling, and workflow state management using a distributed lock via ZooKeeper or a database. WorkerServers pull tasks from the queue and execute them in isolated processes. An API server exposes REST endpoints consumed by the web frontend. All metadata and workflow definitions are stored in a relational database (MySQL or PostgreSQL).

Self-Hosting & Configuration

  • Requires Java 8+, a relational database (MySQL 5.7+ or PostgreSQL 12+), and optionally ZooKeeper
  • Deploy as standalone, pseudo-cluster, or full cluster mode depending on scale
  • Configure datasource connections in the web UI for Hive, Spark, PostgreSQL, and other engines
  • Set worker groups to route specific task types to designated machines
  • Enable alerting via email, DingTalk, WeChat, PagerDuty, or custom webhook plugins

Key Features

  • Visual drag-and-drop workflow designer with sub-workflow support and parameter passing
  • Distributed architecture with horizontal scaling of master and worker nodes
  • Complement and dependent scheduling modes for cross-workflow coordination
  • Built-in resource center for managing scripts, configuration files, and UDFs
  • SLA monitoring with configurable timeout alerts and failure retry policies

Comparison with Similar Tools

  • Apache Airflow — Python-centric DAG scheduler with code-as-config; DolphinScheduler offers a visual editor and multi-tenancy out of the box
  • Dagster — asset-focused orchestration with strong testing; DolphinScheduler focuses on operational scheduling at scale
  • Prefect — Python-native workflow engine with a managed cloud option; DolphinScheduler provides more built-in task types
  • Azkaban — LinkedIn's batch workflow scheduler; DolphinScheduler has a more modern architecture and active development
  • Luigi — lightweight Python pipeline framework; DolphinScheduler adds distributed execution and a full web UI

FAQ

Q: How does DolphinScheduler differ from Airflow? A: DolphinScheduler provides a visual DAG editor, native multi-tenancy, and a master-worker distributed architecture, while Airflow defines workflows as Python code and relies on Celery or Kubernetes for distribution.

Q: Can it integrate with cloud services? A: Yes. It supports task types for AWS EMR, Google Dataproc, and various cloud SQL services via JDBC connections.

Q: What scale can DolphinScheduler handle? A: Production deployments manage tens of thousands of concurrent tasks across hundreds of worker nodes.

Q: Is there a managed cloud version? A: DolphinScheduler is self-hosted. Some cloud providers offer it as part of their managed data platform services.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产