# Ray — Distributed Computing for Python and AI Workloads

> Ray is a unified framework for scaling Python and AI applications. From distributed training and hyperparameter search to large-scale data processing and model serving — Ray powers the infrastructure behind ChatGPT, Uber, and Pinterest.

## Install

Save as a script file and run:

# Ray — Distributed Computing for Python + AI

## Quick Use
```bash
pip install "ray[default]"
```

```python
import ray
ray.init()

@ray.remote
def heavy(n):
    return sum(i * i for i in range(n))

futures = [heavy.remote(10_000_000) for _ in range(8)]
results = ray.get(futures)
print(sum(results))
```

## Introduction
Ray is the open-source distributed computing framework that powers OpenAI's training infrastructure, Uber's ML platform, and Pinterest's ranking systems. Created at UC Berkeley (RISELab) and developed by Anyscale, Ray scales Python from a single laptop to a multi-thousand-node cluster with the same code.

With over 42,000 GitHub stars, Ray is the most general-purpose AI infrastructure project: distributed Python tasks (Ray Core), training (Ray Train), tuning (Ray Tune), serving (Ray Serve), and data processing (Ray Data) — all on one unified runtime.

## What Ray Does
Ray provides a `@ray.remote` decorator that turns Python functions and classes into distributed tasks/actors. The Ray runtime handles scheduling, fault tolerance, and inter-process communication. Higher-level libraries (Train, Tune, Serve, Data, RLlib) sit on top of this core for ML-specific workflows.

## Architecture Overview
```
[Driver Process]
  Your Python script
      |
[Ray Cluster Runtime]
  Head node + worker nodes
      |
  +--------+--------+--------+--------+
  |        |        |        |        |
Ray Core  Train    Tune    Serve     Data
 tasks /  distrib  hyper-  online   distrib
 actors   training param   model    data
          (PyTorch search  serving  proc
           Lightning, ...)
      |
[Object Store + Plasma]
  zero-copy shared memory between tasks
      |
[Autoscaler]
  add/remove EC2/GCE/Kubernetes nodes
```

## Self-Hosting & Configuration
```python
# Distributed actors
import ray

@ray.remote
class Counter:
    def __init__(self): self.n = 0
    def add(self, x): self.n += x; return self.n

c = Counter.remote()
print(ray.get([c.add.remote(i) for i in range(10)]))

# Distributed training with Ray Train (PyTorch DDP)
from ray.train.torch import TorchTrainer
from ray.train import ScalingConfig

def train_func():
    import torch.nn as nn
    model = nn.Linear(10, 1)
    # ... full DDP setup handled by Ray Train

trainer = TorchTrainer(train_func, scaling_config=ScalingConfig(num_workers=4, use_gpu=True))
trainer.fit()

# Online model serving with Ray Serve
from ray import serve
@serve.deployment(num_replicas=4, route_prefix="/predict")
class Model:
    def __call__(self, request):
        return {"result": "hello"}
serve.run(Model.bind())
```

## Key Features
- **Ray Core** — distributed tasks/actors with @ray.remote
- **Ray Train** — distributed training (PyTorch, TensorFlow, XGBoost)
- **Ray Tune** — hyperparameter search at scale (ASHA, BOHB, HyperOpt)
- **Ray Serve** — production model serving with autoscaling
- **Ray Data** — distributed data preprocessing for ML pipelines
- **RLlib** — reinforcement learning library at industrial scale
- **Autoscaler** — managed cluster expansion on AWS/GCP/Azure/Kubernetes
- **Object store** — zero-copy shared memory for fast intra-cluster transfers

## Comparison with Similar Tools
| Feature | Ray | Dask | Spark | Modal | Celery |
|---|---|---|---|---|---|
| Python-native | Yes | Yes | Wrapper | Yes | Yes |
| ML libraries | Train/Tune/Serve/RLlib | dask-ml | MLlib | Custom | None |
| Online serving | Yes (Ray Serve) | No | No | Yes | No |
| Stateful actors | Yes | Limited | Limited | Limited | No |
| Cluster management | Built-in autoscaler | Limited | Yarn/K8s | Managed | None |
| Best For | AI/ML at scale | Pythonic data science | Big data ETL | Serverless GPUs | Background jobs |

## FAQ
**Q: Ray vs Dask?**
A: Dask is great for parallelizing pandas/NumPy workflows. Ray is broader: actors, distributed RL, online serving, training. For pure DataFrame work, Dask is simpler; for ML platforms, Ray is the standard.

**Q: Ray vs Spark?**
A: Spark dominates traditional big-data ETL. Ray dominates ML training and serving. Many platforms run both: Spark for upstream data prep, Ray for downstream training.

**Q: Do I need Anyscale?**
A: No — Ray is fully open source and runs on your own infrastructure (laptop, EC2, Kubernetes, KubeRay). Anyscale offers a managed service if you don't want to operate clusters.

**Q: How does it scale?**
A: From single-machine multiprocessing to thousands of nodes with the same code. The autoscaler talks to AWS/GCP/Azure or KubeRay (Kubernetes operator) to add/remove workers based on demand.

## Sources
- GitHub: https://github.com/ray-project/ray
- Docs: https://docs.ray.io
- Company: Anyscale
- License: Apache-2.0

---
Source: https://tokrepo.com/en/workflows/b0f2e5e4-37db-11f1-9bc6-00163e2b0d79
Author: Script Depot