Scripts2026年5月23日·1 分钟阅读

Apache DolphinScheduler — Distributed Data Workflow Orchestration Platform

Apache DolphinScheduler is a cloud-native workflow orchestration platform with a visual DAG editor, multi-tenant support, and distributed task execution for data pipelines.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Apache DolphinScheduler Overview
直接安装命令
npx -y tokrepo@latest install bb2a3f0d-56a1-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

Apache DolphinScheduler is a distributed, cloud-native workflow orchestration platform designed for data pipeline scheduling. It provides a drag-and-drop visual DAG editor, multi-tenant isolation, and built-in support for dozens of task types including Shell, SQL, Spark, Flink, and Python, making it a strong choice for data engineering teams managing complex ETL workflows.

What Apache DolphinScheduler Does

  • Orchestrates complex data workflows as directed acyclic graphs (DAGs) with visual editing
  • Schedules tasks with cron expressions, manual triggers, and event-driven dependencies
  • Supports 30+ task types including Shell, Python, SQL, Spark, Flink, MapReduce, and HTTP
  • Provides multi-tenant resource isolation with role-based access control
  • Monitors workflow execution with real-time logs, alerts, and retry mechanisms

Architecture Overview

DolphinScheduler uses a master-worker architecture. MasterServer handles DAG parsing, task scheduling, and workflow state management using a distributed lock via ZooKeeper or a database. WorkerServers pull tasks from the queue and execute them in isolated processes. An API server exposes REST endpoints consumed by the web frontend. All metadata and workflow definitions are stored in a relational database (MySQL or PostgreSQL).

Self-Hosting & Configuration

  • Requires Java 8+, a relational database (MySQL 5.7+ or PostgreSQL 12+), and optionally ZooKeeper
  • Deploy as standalone, pseudo-cluster, or full cluster mode depending on scale
  • Configure datasource connections in the web UI for Hive, Spark, PostgreSQL, and other engines
  • Set worker groups to route specific task types to designated machines
  • Enable alerting via email, DingTalk, WeChat, PagerDuty, or custom webhook plugins

Key Features

  • Visual drag-and-drop workflow designer with sub-workflow support and parameter passing
  • Distributed architecture with horizontal scaling of master and worker nodes
  • Complement and dependent scheduling modes for cross-workflow coordination
  • Built-in resource center for managing scripts, configuration files, and UDFs
  • SLA monitoring with configurable timeout alerts and failure retry policies

Comparison with Similar Tools

  • Apache Airflow — Python-centric DAG scheduler with code-as-config; DolphinScheduler offers a visual editor and multi-tenancy out of the box
  • Dagster — asset-focused orchestration with strong testing; DolphinScheduler focuses on operational scheduling at scale
  • Prefect — Python-native workflow engine with a managed cloud option; DolphinScheduler provides more built-in task types
  • Azkaban — LinkedIn's batch workflow scheduler; DolphinScheduler has a more modern architecture and active development
  • Luigi — lightweight Python pipeline framework; DolphinScheduler adds distributed execution and a full web UI

FAQ

Q: How does DolphinScheduler differ from Airflow? A: DolphinScheduler provides a visual DAG editor, native multi-tenancy, and a master-worker distributed architecture, while Airflow defines workflows as Python code and relies on Celery or Kubernetes for distribution.

Q: Can it integrate with cloud services? A: Yes. It supports task types for AWS EMR, Google Dataproc, and various cloud SQL services via JDBC connections.

Q: What scale can DolphinScheduler handle? A: Production deployments manage tens of thousands of concurrent tasks across hundreds of worker nodes.

Q: Is there a managed cloud version? A: DolphinScheduler is self-hosted. Some cloud providers offer it as part of their managed data platform services.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产