Ray — Distributed Computing for Python and AI Workloads
Ray is a unified framework for scaling Python and AI applications. From distributed training and hyperparameter search to large-scale data processing and model serving — Ray powers the infrastructure behind ChatGPT, Uber, and Pinterest.
What it is
Ray is an open-source unified framework for scaling Python and AI applications. It handles distributed training, hyperparameter search, large-scale data processing, and model serving under a single API. You turn any Python function into a distributed task with the @ray.remote decorator.
ML engineers, data scientists, and platform teams use Ray when their workloads outgrow a single machine. Ray's ecosystem includes Ray Train (distributed training), Ray Tune (hyperparameter optimization), Ray Serve (model serving), and Ray Data (distributed data processing).
How it saves time or tokens
Ray abstracts away the complexity of distributed computing. Instead of managing MPI, Dask clusters, or custom job schedulers, you write standard Python and Ray handles scheduling, fault tolerance, and resource allocation. The @ray.remote decorator converts a function call into a distributed task without rewriting your code.
For LLM workloads, Ray Serve can host multiple models behind a single endpoint with autoscaling, reducing infrastructure management overhead.
How to use
- Install Ray:
pip install 'ray[default]'
- Distribute a computation:
import ray
ray.init()
@ray.remote
def heavy(n):
return sum(i * i for i in range(n))
futures = [heavy.remote(10_000_000) for _ in range(8)]
results = ray.get(futures)
print(sum(results))
- This runs 8 parallel tasks across available CPU cores. On a cluster, it distributes across multiple nodes automatically.
Example
Serving a model with Ray Serve:
from ray import serve
@serve.deployment
class Classifier:
def __init__(self):
self.model = load_model()
def __call__(self, request):
data = request.json()
return self.model.predict(data['input'])
serve.run(Classifier.bind())
This deploys a model endpoint with built-in autoscaling, health checks, and request batching.
Related on TokRepo
- AI Tools for Coding -- Development tools for building AI applications
- Local LLM Providers -- Run LLMs locally with distributed inference via Ray
Common pitfalls
- Ray's object store consumes significant memory. On machines with limited RAM, set
object_store_memoryinray.init()to prevent out-of-memory errors. - Serialization errors occur when passing non-picklable objects between tasks. Use Ray's built-in serialization or convert objects to serializable formats before passing.
- Starting Ray on a cluster requires matching Python and Ray versions across all nodes. Use Docker images or Conda environments to ensure consistency.
Frequently Asked Questions
Dask focuses on parallelizing NumPy and Pandas operations with familiar APIs. Ray is more general-purpose, handling arbitrary Python functions, actor-based concurrency, and ML-specific workloads like distributed training and model serving. Many teams use Ray for ML pipelines and Dask for data analytics.
Yes. The KubeRay operator deploys and manages Ray clusters on Kubernetes. It handles autoscaling, fault tolerance, and resource allocation. Ray also supports YARN and Slurm for HPC environments.
Ray Serve is Ray's model serving framework. It deploys ML models as scalable HTTP endpoints with features like request batching, model composition, A/B testing, and autoscaling. It runs on top of Ray's distributed runtime.
Yes. You can request GPU resources in the @ray.remote decorator: `@ray.remote(num_gpus=1)`. Ray's scheduler assigns tasks to nodes with available GPUs and handles multi-GPU distribution for training.
Yes. Ray works on a single machine for local parallelism (replacing multiprocessing) and scales to clusters when needed. The overhead of ray.init() on a laptop is minimal, so you can develop locally and deploy to a cluster without code changes.
Citations (3)
- Ray GitHub— Ray is a unified framework for scaling Python and AI applications
- Ray Documentation— Ray ecosystem includes Train, Tune, Serve, and Data libraries
- KubeRay GitHub— KubeRay operator for running Ray on Kubernetes
Related on TokRepo
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.