Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsJul 1, 2026·3 min de lectura

Volcano — Kubernetes Batch and HPC Job Scheduler

Volcano is a cloud-native batch scheduling system for Kubernetes that supports machine learning, deep learning, bioinformatics, and high-performance computing workloads with advanced scheduling policies.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Volcano Overview
Comando de instalación directa
npx -y tokrepo@latest install e3acbb23-751f-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

Volcano is a CNCF incubating project that extends Kubernetes with batch scheduling capabilities. It was created to address the gap between Kubernetes' default scheduler and the demands of compute-intensive workloads such as ML training, genomics pipelines, and scientific simulations.

What Volcano Does

  • Provides gang scheduling so all pods in a job start together or not at all
  • Supports fair-share, priority, and preemption scheduling policies
  • Manages job lifecycles with dependency-aware task ordering
  • Integrates with frameworks like Spark, TensorFlow, PyTorch, and MPI
  • Offers queue-based resource management for multi-tenant clusters

Architecture Overview

Volcano consists of three main components: the Volcano Scheduler (a custom kube-scheduler that implements advanced scheduling algorithms), the Volcano Controller Manager (which manages CRDs like Job, Queue, and PodGroup), and the Volcano Admission Webhook (which validates and mutates resources). These components run as deployments in the volcano-system namespace and extend the Kubernetes API with custom resource definitions.

Self-Hosting & Configuration

  • Deploy via Helm chart or YAML manifests from the official repository
  • Configure scheduling policies through SchedulerConfiguration CRD
  • Set up Queues with resource quotas for multi-tenant isolation
  • Tune gang scheduling parameters per job via PodGroup minAvailable
  • Monitor through Prometheus metrics exposed by the scheduler

Key Features

  • Gang scheduling ensures all-or-nothing pod allocation for distributed jobs
  • Multiple scheduling algorithms: gang, binpack, fair-share, DRF, proportion
  • Native CRD-based job management with task-level dependency graphs
  • Queue management with hierarchical resource allocation
  • Plugin-based scheduler architecture for custom scheduling logic

Comparison with Similar Tools

  • Default kube-scheduler — handles general workloads but lacks gang scheduling and job-level awareness
  • Apache YuniKorn — similar batch scheduler with different queue model and resource partitioning
  • Kueue — newer Kubernetes-native job queueing focused on quota management, less scheduling customization
  • Armada — multi-cluster job scheduling at larger scale, more complex setup

FAQ

Q: Does Volcano replace the default Kubernetes scheduler? A: No. Volcano runs alongside the default scheduler. You assign workloads to Volcano by setting schedulerName: volcano in your pod spec.

Q: Can Volcano schedule GPU workloads? A: Yes. Volcano supports GPU scheduling and can enforce topology-aware placement for multi-GPU training jobs.

Q: What is gang scheduling and why does it matter? A: Gang scheduling ensures all pods in a group are scheduled simultaneously. This prevents deadlocks in distributed training where partial allocation wastes resources.

Q: Does Volcano work with managed Kubernetes services? A: Yes. Volcano runs on any conformant Kubernetes cluster including EKS, GKE, and AKS.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados