What is KubeRay — Run Ray Distributed Computing on Kubernetes?

KubeRay is a Kubernetes operator that manages Ray clusters on Kubernetes, enabling distributed AI training, serving, and data processing workloads with automatic scaling and lifecycle management.

Is KubeRay — Run Ray Distributed Computing on Kubernetes free to use?

Yes. KubeRay — Run Ray Distributed Computing on Kubernetes is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install KubeRay — Run Ray Distributed Computing on Kubernetes?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

KubeRay — Run Ray Distributed Computing on Kubernetes

Introduction

KubeRay brings the Ray distributed computing framework to Kubernetes as a first-class citizen. Ray is widely used for distributed training, model serving (Ray Serve), and data processing, but managing Ray clusters manually is complex. KubeRay automates cluster provisioning, scaling, fault recovery, and upgrades through Kubernetes-native custom resources.

What KubeRay Does

Deploys and manages Ray clusters on Kubernetes via CRDs
Provides RayCluster, RayJob, and RayService custom resources
Auto-scales Ray worker nodes based on workload demand
Handles head node failover and worker recovery automatically
Integrates with Kubernetes scheduling, RBAC, and resource quotas

Architecture Overview

KubeRay consists of the KubeRay Operator (a controller that watches CRDs and reconciles cluster state), RayCluster CRD (declares a Ray head plus worker group configuration), RayJob CRD (submits a one-off job to a managed cluster), and RayService CRD (deploys a long-running Ray Serve application with rolling upgrades). The operator creates pods, services, and ingress resources to match the desired state, and monitors Ray's autoscaler to adjust worker replicas.

Self-Hosting & Configuration

Deploy the KubeRay operator via Helm into a dedicated namespace
Define RayCluster resources with head node and worker group specs including GPU requests
Configure Ray autoscaler parameters for dynamic worker scaling
Set resource limits and node affinity for GPU and CPU worker pools
Use RayService for production serving with zero-downtime upgrades

Key Features

Three CRDs cover clusters, batch jobs, and serving workloads
Autoscaling integrates Ray's built-in autoscaler with Kubernetes pod scheduling
Rolling upgrades for Ray Serve applications with zero-downtime deployments
GPU scheduling support for distributed training and inference workloads
Compatible with cloud-managed Kubernetes and bare-metal clusters

Comparison with Similar Tools

Manual Ray deployment — requires hand-managed VMs or containers, no auto-recovery
Ray on Spark — runs Ray within Spark clusters, different resource model
Kubeflow — broader ML platform with training operators, KubeRay focuses specifically on Ray
Volcano — batch scheduler that can co-exist with KubeRay for gang scheduling Ray jobs

FAQ

Q: Do I need to modify my Ray code to use KubeRay? A: No. Your existing Ray scripts run unchanged. KubeRay handles the infrastructure; Ray code connects to the head node as usual.

Q: How does KubeRay handle GPU scheduling? A: Worker group specs accept standard Kubernetes resource requests including nvidia.com/gpu. The operator creates pods with GPU requests, and Kubernetes schedules them onto GPU nodes.

Q: Can I run Ray Serve behind an ingress? A: Yes. KubeRay creates a head service that you can expose via Ingress or Gateway API for external traffic to Ray Serve endpoints.

Q: What happens when the Ray head node crashes? A: KubeRay detects the failure and recreates the head pod. GCS fault tolerance (enabled by default in newer Ray versions) allows workers to reconnect without restarting.

KubeRay — Run Ray Distributed Computing on Kubernetes

Ready-to-run agent install

Introduction

What KubeRay Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Ray — Distributed Computing for Python and AI Workloads

kind — Run Local Kubernetes Clusters in Docker

Minikube — Run Kubernetes Locally on Any OS

ZLUDA — Run CUDA Applications on AMD and Intel GPUs