What is Volcano — Kubernetes Batch and HPC Job Scheduler?

Volcano is a cloud-native batch scheduling system for Kubernetes that supports machine learning, deep learning, bioinformatics, and high-performance computing workloads with advanced scheduling policies.

Is Volcano — Kubernetes Batch and HPC Job Scheduler free to use?

Yes. Volcano — Kubernetes Batch and HPC Job Scheduler is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Volcano — Kubernetes Batch and HPC Job Scheduler?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Volcano — Kubernetes Batch and HPC Job Scheduler

Introduction

Volcano is a CNCF incubating project that extends Kubernetes with batch scheduling capabilities. It was created to address the gap between Kubernetes' default scheduler and the demands of compute-intensive workloads such as ML training, genomics pipelines, and scientific simulations.

What Volcano Does

Provides gang scheduling so all pods in a job start together or not at all
Supports fair-share, priority, and preemption scheduling policies
Manages job lifecycles with dependency-aware task ordering
Integrates with frameworks like Spark, TensorFlow, PyTorch, and MPI
Offers queue-based resource management for multi-tenant clusters

Architecture Overview

Volcano consists of three main components: the Volcano Scheduler (a custom kube-scheduler that implements advanced scheduling algorithms), the Volcano Controller Manager (which manages CRDs like Job, Queue, and PodGroup), and the Volcano Admission Webhook (which validates and mutates resources). These components run as deployments in the volcano-system namespace and extend the Kubernetes API with custom resource definitions.

Self-Hosting & Configuration

Deploy via Helm chart or YAML manifests from the official repository
Configure scheduling policies through SchedulerConfiguration CRD
Set up Queues with resource quotas for multi-tenant isolation
Tune gang scheduling parameters per job via PodGroup minAvailable
Monitor through Prometheus metrics exposed by the scheduler

Key Features

Gang scheduling ensures all-or-nothing pod allocation for distributed jobs
Multiple scheduling algorithms: gang, binpack, fair-share, DRF, proportion
Native CRD-based job management with task-level dependency graphs
Queue management with hierarchical resource allocation
Plugin-based scheduler architecture for custom scheduling logic

Comparison with Similar Tools

Default kube-scheduler — handles general workloads but lacks gang scheduling and job-level awareness
Apache YuniKorn — similar batch scheduler with different queue model and resource partitioning
Kueue — newer Kubernetes-native job queueing focused on quota management, less scheduling customization
Armada — multi-cluster job scheduling at larger scale, more complex setup

FAQ

Q: Does Volcano replace the default Kubernetes scheduler? A: No. Volcano runs alongside the default scheduler. You assign workloads to Volcano by setting schedulerName: volcano in your pod spec.

Q: Can Volcano schedule GPU workloads? A: Yes. Volcano supports GPU scheduling and can enforce topology-aware placement for multi-GPU training jobs.

Q: What is gang scheduling and why does it matter? A: Gang scheduling ensures all pods in a group are scheduled simultaneously. This prevents deadlocks in distributed training where partial allocation wastes resources.

Q: Does Volcano work with managed Kubernetes services? A: Yes. Volcano runs on any conformant Kubernetes cluster including EKS, GKE, and AKS.

Volcano — Kubernetes Batch and HPC Job Scheduler

Ready-to-run agent install

Introduction

What Volcano Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Sealos — Cloud Operating System for Kubernetes

SkyPilot — Run AI Workloads on Any Cloud or Kubernetes

Hubble — Network Observability for Kubernetes via eBPF

Kubernetes Cluster Autoscaler — Node-Level Autoscaling for K8s