ScriptsApr 14, 2026·3 min read

Ray — Distributed Computing for Python and AI Workloads

Ray is a unified framework for scaling Python and AI applications. From distributed training and hyperparameter search to large-scale data processing and model serving — Ray powers the infrastructure behind ChatGPT, Uber, and Pinterest.

TL;DR
Ray scales Python and AI workloads across clusters with a simple decorator API for distributed training, tuning, and serving.
§01

What it is

Ray is an open-source unified framework for scaling Python and AI applications. It handles distributed training, hyperparameter search, large-scale data processing, and model serving under a single API. You turn any Python function into a distributed task with the @ray.remote decorator.

ML engineers, data scientists, and platform teams use Ray when their workloads outgrow a single machine. Ray's ecosystem includes Ray Train (distributed training), Ray Tune (hyperparameter optimization), Ray Serve (model serving), and Ray Data (distributed data processing).

§02

How it saves time or tokens

Ray abstracts away the complexity of distributed computing. Instead of managing MPI, Dask clusters, or custom job schedulers, you write standard Python and Ray handles scheduling, fault tolerance, and resource allocation. The @ray.remote decorator converts a function call into a distributed task without rewriting your code.

For LLM workloads, Ray Serve can host multiple models behind a single endpoint with autoscaling, reducing infrastructure management overhead.

§03

How to use

  1. Install Ray:
pip install 'ray[default]'
  1. Distribute a computation:
import ray
ray.init()

@ray.remote
def heavy(n):
    return sum(i * i for i in range(n))

futures = [heavy.remote(10_000_000) for _ in range(8)]
results = ray.get(futures)
print(sum(results))
  1. This runs 8 parallel tasks across available CPU cores. On a cluster, it distributes across multiple nodes automatically.
§04

Example

Serving a model with Ray Serve:

from ray import serve

@serve.deployment
class Classifier:
    def __init__(self):
        self.model = load_model()

    def __call__(self, request):
        data = request.json()
        return self.model.predict(data['input'])

serve.run(Classifier.bind())

This deploys a model endpoint with built-in autoscaling, health checks, and request batching.

§05

Related on TokRepo

§06

Common pitfalls

  • Ray's object store consumes significant memory. On machines with limited RAM, set object_store_memory in ray.init() to prevent out-of-memory errors.
  • Serialization errors occur when passing non-picklable objects between tasks. Use Ray's built-in serialization or convert objects to serializable formats before passing.
  • Starting Ray on a cluster requires matching Python and Ray versions across all nodes. Use Docker images or Conda environments to ensure consistency.

Frequently Asked Questions

How does Ray compare to Dask?+

Dask focuses on parallelizing NumPy and Pandas operations with familiar APIs. Ray is more general-purpose, handling arbitrary Python functions, actor-based concurrency, and ML-specific workloads like distributed training and model serving. Many teams use Ray for ML pipelines and Dask for data analytics.

Can Ray run on Kubernetes?+

Yes. The KubeRay operator deploys and manages Ray clusters on Kubernetes. It handles autoscaling, fault tolerance, and resource allocation. Ray also supports YARN and Slurm for HPC environments.

What is Ray Serve?+

Ray Serve is Ray's model serving framework. It deploys ML models as scalable HTTP endpoints with features like request batching, model composition, A/B testing, and autoscaling. It runs on top of Ray's distributed runtime.

Does Ray support GPU workloads?+

Yes. You can request GPU resources in the @ray.remote decorator: `@ray.remote(num_gpus=1)`. Ray's scheduler assigns tasks to nodes with available GPUs and handles multi-GPU distribution for training.

Is Ray suitable for small-scale projects?+

Yes. Ray works on a single machine for local parallelism (replacing multiprocessing) and scales to clusters when needed. The overhead of ray.init() on a laptop is minimal, so you can develop locally and deploy to a cluster without code changes.

Citations (3)
  • Ray GitHub— Ray is a unified framework for scaling Python and AI applications
  • Ray Documentation— Ray ecosystem includes Train, Tune, Serve, and Data libraries
  • KubeRay GitHub— KubeRay operator for running Ray on Kubernetes

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets