ConfigsApr 15, 2026·3 min read

Argo Workflows — Kubernetes-Native Workflow Engine

Argo Workflows is a CNCF-graduated workflow engine for orchestrating parallel jobs on Kubernetes, modelling pipelines as DAGs where each step runs as a container.

Introduction

Argo Workflows models data pipelines, CI jobs, and ML training as Kubernetes Custom Resources. Each step is a container, composability is a DAG, and fan-out/fan-in is a first-class citizen — which means you can run thousands of parallel pods against a single cluster with native retries, artifact passing, and Prometheus metrics.

What Argo Workflows Does

  • Runs DAG or sequential step workflows where every node is a containerized task.
  • Passes artifacts (S3/GCS/MinIO/HTTP/git) and parameters between steps automatically.
  • Supports dynamic fan-out via withItems, withParam, and loops over JSON outputs.
  • Retries, timeouts, suspend/resume, cron schedules, and manual approval gates built in.
  • Exposes a web UI, gRPC/REST API, and a CLI for submitting, watching, and debugging runs.

Architecture Overview

A workflow controller runs in the cluster, watches Workflow and CronWorkflow CRs, and schedules one Pod per step with a lightweight sidecar (argoexec) that handles artifact I/O and log streaming. State is stored directly in the CR plus an optional offload database (Postgres/MySQL) for very large workflows. The argo-server provides a stateless gRPC/REST gateway and the React UI.

Self-Hosting & Configuration

  • Deploy via the official manifests or the Helm chart; controller and server are separate Deployments.
  • Configure artifact storage once in workflow-controller-configmap.yaml (S3, GCS, Azure, OCI, MinIO).
  • Turn on SSO with OIDC (server-sso-secret) to integrate with Dex, Okta, or GitHub.
  • Enable workflow archival with Postgres for long-term history and search.
  • Tune scale with PARALLELISM, pod GC, and podPriorityClassName for high-fan-out training jobs.

Key Features

  • Events engine (argo-events) for trigger-driven pipelines from webhooks, Kafka, S3, or CronJobs.
  • Workflow templates and ClusterWorkflowTemplate for reusable, versioned pipeline building blocks.
  • Rich data processing primitives: suspend, retry strategies, metrics, exit handlers.
  • Works seamlessly with Argo CD for full GitOps pipeline delivery.
  • Production-proven — CNCF Graduated, used by Intuit, BlackRock, NVIDIA, and many others.

Comparison with Similar Tools

  • Apache Airflow — Python-centric DAGs; Argo is YAML/CRD-native and container-per-step from the ground up.
  • Tekton Pipelines — also Kubernetes-native but focused on CI/CD; Argo is broader (ML, data, batch).
  • Prefect / Dagster — Python-first data orchestrators; Argo is language-agnostic via containers.
  • Kubeflow Pipelines — actually builds on Argo Workflows for its execution engine.
  • Kestra / Flyte — similar goals; Flyte has stronger typed data pipelines, Argo stronger k8s integration.

FAQ

Q: Do I need a database? A: No for small clusters — state lives in the CR. For many/huge workflows, enable Postgres offload + archival.

Q: How do artifacts work? A: Configure a bucket once; each step declares inputs.artifacts and outputs.artifacts which argoexec moves.

Q: Can I run it on a laptop? A: Yes — minikube, kind, or Colima give you a local cluster; Argo installs cleanly in a few minutes.

Q: How is it different from Argo CD? A: Argo CD is GitOps for Kubernetes resources; Argo Workflows runs pipelines. They share maintainers and often compose.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets