ScriptsJul 1, 2026·3 min read

KubeRay — Run Ray Distributed Computing on Kubernetes

KubeRay is a Kubernetes operator that manages Ray clusters on Kubernetes, enabling distributed AI training, serving, and data processing workloads with automatic scaling and lifecycle management.

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
KubeRay Overview
Direct install command
npx -y tokrepo@latest install 7f97f9e3-7520-11f1-9bc6-00163e2b0d79 --target codex

Run after dry-run confirms the install plan.

Introduction

KubeRay brings the Ray distributed computing framework to Kubernetes as a first-class citizen. Ray is widely used for distributed training, model serving (Ray Serve), and data processing, but managing Ray clusters manually is complex. KubeRay automates cluster provisioning, scaling, fault recovery, and upgrades through Kubernetes-native custom resources.

What KubeRay Does

  • Deploys and manages Ray clusters on Kubernetes via CRDs
  • Provides RayCluster, RayJob, and RayService custom resources
  • Auto-scales Ray worker nodes based on workload demand
  • Handles head node failover and worker recovery automatically
  • Integrates with Kubernetes scheduling, RBAC, and resource quotas

Architecture Overview

KubeRay consists of the KubeRay Operator (a controller that watches CRDs and reconciles cluster state), RayCluster CRD (declares a Ray head plus worker group configuration), RayJob CRD (submits a one-off job to a managed cluster), and RayService CRD (deploys a long-running Ray Serve application with rolling upgrades). The operator creates pods, services, and ingress resources to match the desired state, and monitors Ray's autoscaler to adjust worker replicas.

Self-Hosting & Configuration

  • Deploy the KubeRay operator via Helm into a dedicated namespace
  • Define RayCluster resources with head node and worker group specs including GPU requests
  • Configure Ray autoscaler parameters for dynamic worker scaling
  • Set resource limits and node affinity for GPU and CPU worker pools
  • Use RayService for production serving with zero-downtime upgrades

Key Features

  • Three CRDs cover clusters, batch jobs, and serving workloads
  • Autoscaling integrates Ray's built-in autoscaler with Kubernetes pod scheduling
  • Rolling upgrades for Ray Serve applications with zero-downtime deployments
  • GPU scheduling support for distributed training and inference workloads
  • Compatible with cloud-managed Kubernetes and bare-metal clusters

Comparison with Similar Tools

  • Manual Ray deployment — requires hand-managed VMs or containers, no auto-recovery
  • Ray on Spark — runs Ray within Spark clusters, different resource model
  • Kubeflow — broader ML platform with training operators, KubeRay focuses specifically on Ray
  • Volcano — batch scheduler that can co-exist with KubeRay for gang scheduling Ray jobs

FAQ

Q: Do I need to modify my Ray code to use KubeRay? A: No. Your existing Ray scripts run unchanged. KubeRay handles the infrastructure; Ray code connects to the head node as usual.

Q: How does KubeRay handle GPU scheduling? A: Worker group specs accept standard Kubernetes resource requests including nvidia.com/gpu. The operator creates pods with GPU requests, and Kubernetes schedules them onto GPU nodes.

Q: Can I run Ray Serve behind an ingress? A: Yes. KubeRay creates a head service that you can expose via Ingress or Gateway API for external traffic to Ray Serve endpoints.

Q: What happens when the Ray head node crashes? A: KubeRay detects the failure and recreates the head pod. GCS fault tolerance (enabled by default in newer Ray versions) allows workers to reconnect without restarting.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets