Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 23, 2026·3 min de lectura

Apache DolphinScheduler — Distributed Data Workflow Orchestration Platform

Apache DolphinScheduler is a cloud-native workflow orchestration platform with a visual DAG editor, multi-tenant support, and distributed task execution for data pipelines.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Apache DolphinScheduler Overview
Comando CLI universal
npx tokrepo install bb2a3f0d-56a1-11f1-9bc6-00163e2b0d79

Introduction

Apache DolphinScheduler is a distributed, cloud-native workflow orchestration platform designed for data pipeline scheduling. It provides a drag-and-drop visual DAG editor, multi-tenant isolation, and built-in support for dozens of task types including Shell, SQL, Spark, Flink, and Python, making it a strong choice for data engineering teams managing complex ETL workflows.

What Apache DolphinScheduler Does

  • Orchestrates complex data workflows as directed acyclic graphs (DAGs) with visual editing
  • Schedules tasks with cron expressions, manual triggers, and event-driven dependencies
  • Supports 30+ task types including Shell, Python, SQL, Spark, Flink, MapReduce, and HTTP
  • Provides multi-tenant resource isolation with role-based access control
  • Monitors workflow execution with real-time logs, alerts, and retry mechanisms

Architecture Overview

DolphinScheduler uses a master-worker architecture. MasterServer handles DAG parsing, task scheduling, and workflow state management using a distributed lock via ZooKeeper or a database. WorkerServers pull tasks from the queue and execute them in isolated processes. An API server exposes REST endpoints consumed by the web frontend. All metadata and workflow definitions are stored in a relational database (MySQL or PostgreSQL).

Self-Hosting & Configuration

  • Requires Java 8+, a relational database (MySQL 5.7+ or PostgreSQL 12+), and optionally ZooKeeper
  • Deploy as standalone, pseudo-cluster, or full cluster mode depending on scale
  • Configure datasource connections in the web UI for Hive, Spark, PostgreSQL, and other engines
  • Set worker groups to route specific task types to designated machines
  • Enable alerting via email, DingTalk, WeChat, PagerDuty, or custom webhook plugins

Key Features

  • Visual drag-and-drop workflow designer with sub-workflow support and parameter passing
  • Distributed architecture with horizontal scaling of master and worker nodes
  • Complement and dependent scheduling modes for cross-workflow coordination
  • Built-in resource center for managing scripts, configuration files, and UDFs
  • SLA monitoring with configurable timeout alerts and failure retry policies

Comparison with Similar Tools

  • Apache Airflow — Python-centric DAG scheduler with code-as-config; DolphinScheduler offers a visual editor and multi-tenancy out of the box
  • Dagster — asset-focused orchestration with strong testing; DolphinScheduler focuses on operational scheduling at scale
  • Prefect — Python-native workflow engine with a managed cloud option; DolphinScheduler provides more built-in task types
  • Azkaban — LinkedIn's batch workflow scheduler; DolphinScheduler has a more modern architecture and active development
  • Luigi — lightweight Python pipeline framework; DolphinScheduler adds distributed execution and a full web UI

FAQ

Q: How does DolphinScheduler differ from Airflow? A: DolphinScheduler provides a visual DAG editor, native multi-tenancy, and a master-worker distributed architecture, while Airflow defines workflows as Python code and relies on Celery or Kubernetes for distribution.

Q: Can it integrate with cloud services? A: Yes. It supports task types for AWS EMR, Google Dataproc, and various cloud SQL services via JDBC connections.

Q: What scale can DolphinScheduler handle? A: Production deployments manage tens of thousands of concurrent tasks across hundreds of worker nodes.

Q: Is there a managed cloud version? A: DolphinScheduler is self-hosted. Some cloud providers offer it as part of their managed data platform services.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados