ScriptsJul 1, 2026·3 min read

Volcano — Kubernetes Batch and HPC Job Scheduler

Volcano is a cloud-native batch scheduling system for Kubernetes that supports machine learning, deep learning, bioinformatics, and high-performance computing workloads with advanced scheduling policies.

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Volcano Overview
Direct install command
npx -y tokrepo@latest install e3acbb23-751f-11f1-9bc6-00163e2b0d79 --target codex

Run after dry-run confirms the install plan.

Introduction

Volcano is a CNCF incubating project that extends Kubernetes with batch scheduling capabilities. It was created to address the gap between Kubernetes' default scheduler and the demands of compute-intensive workloads such as ML training, genomics pipelines, and scientific simulations.

What Volcano Does

  • Provides gang scheduling so all pods in a job start together or not at all
  • Supports fair-share, priority, and preemption scheduling policies
  • Manages job lifecycles with dependency-aware task ordering
  • Integrates with frameworks like Spark, TensorFlow, PyTorch, and MPI
  • Offers queue-based resource management for multi-tenant clusters

Architecture Overview

Volcano consists of three main components: the Volcano Scheduler (a custom kube-scheduler that implements advanced scheduling algorithms), the Volcano Controller Manager (which manages CRDs like Job, Queue, and PodGroup), and the Volcano Admission Webhook (which validates and mutates resources). These components run as deployments in the volcano-system namespace and extend the Kubernetes API with custom resource definitions.

Self-Hosting & Configuration

  • Deploy via Helm chart or YAML manifests from the official repository
  • Configure scheduling policies through SchedulerConfiguration CRD
  • Set up Queues with resource quotas for multi-tenant isolation
  • Tune gang scheduling parameters per job via PodGroup minAvailable
  • Monitor through Prometheus metrics exposed by the scheduler

Key Features

  • Gang scheduling ensures all-or-nothing pod allocation for distributed jobs
  • Multiple scheduling algorithms: gang, binpack, fair-share, DRF, proportion
  • Native CRD-based job management with task-level dependency graphs
  • Queue management with hierarchical resource allocation
  • Plugin-based scheduler architecture for custom scheduling logic

Comparison with Similar Tools

  • Default kube-scheduler — handles general workloads but lacks gang scheduling and job-level awareness
  • Apache YuniKorn — similar batch scheduler with different queue model and resource partitioning
  • Kueue — newer Kubernetes-native job queueing focused on quota management, less scheduling customization
  • Armada — multi-cluster job scheduling at larger scale, more complex setup

FAQ

Q: Does Volcano replace the default Kubernetes scheduler? A: No. Volcano runs alongside the default scheduler. You assign workloads to Volcano by setting schedulerName: volcano in your pod spec.

Q: Can Volcano schedule GPU workloads? A: Yes. Volcano supports GPU scheduling and can enforce topology-aware placement for multi-GPU training jobs.

Q: What is gang scheduling and why does it matter? A: Gang scheduling ensures all pods in a group are scheduled simultaneously. This prevents deadlocks in distributed training where partial allocation wastes resources.

Q: Does Volcano work with managed Kubernetes services? A: Yes. Volcano runs on any conformant Kubernetes cluster including EKS, GKE, and AKS.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets