# Ray — Distributed Computing for Python and AI Workloads > Ray is a unified framework for scaling Python and AI applications. From distributed training and hyperparameter search to large-scale data processing and model serving — Ray powers the infrastructure behind ChatGPT, Uber, and Pinterest. ## Install Save as a script file and run: # Ray — Distributed Computing for Python + AI ## Quick Use ```bash pip install "ray[default]" ``` ```python import ray ray.init() @ray.remote def heavy(n): return sum(i * i for i in range(n)) futures = [heavy.remote(10_000_000) for _ in range(8)] results = ray.get(futures) print(sum(results)) ``` ## Introduction Ray is the open-source distributed computing framework that powers OpenAI's training infrastructure, Uber's ML platform, and Pinterest's ranking systems. Created at UC Berkeley (RISELab) and developed by Anyscale, Ray scales Python from a single laptop to a multi-thousand-node cluster with the same code. With over 42,000 GitHub stars, Ray is the most general-purpose AI infrastructure project: distributed Python tasks (Ray Core), training (Ray Train), tuning (Ray Tune), serving (Ray Serve), and data processing (Ray Data) — all on one unified runtime. ## What Ray Does Ray provides a `@ray.remote` decorator that turns Python functions and classes into distributed tasks/actors. The Ray runtime handles scheduling, fault tolerance, and inter-process communication. Higher-level libraries (Train, Tune, Serve, Data, RLlib) sit on top of this core for ML-specific workflows. ## Architecture Overview ``` [Driver Process] Your Python script | [Ray Cluster Runtime] Head node + worker nodes | +--------+--------+--------+--------+ | | | | | Ray Core Train Tune Serve Data tasks / distrib hyper- online distrib actors training param model data (PyTorch search serving proc Lightning, ...) | [Object Store + Plasma] zero-copy shared memory between tasks | [Autoscaler] add/remove EC2/GCE/Kubernetes nodes ``` ## Self-Hosting & Configuration ```python # Distributed actors import ray @ray.remote class Counter: def __init__(self): self.n = 0 def add(self, x): self.n += x; return self.n c = Counter.remote() print(ray.get([c.add.remote(i) for i in range(10)])) # Distributed training with Ray Train (PyTorch DDP) from ray.train.torch import TorchTrainer from ray.train import ScalingConfig def train_func(): import torch.nn as nn model = nn.Linear(10, 1) # ... full DDP setup handled by Ray Train trainer = TorchTrainer(train_func, scaling_config=ScalingConfig(num_workers=4, use_gpu=True)) trainer.fit() # Online model serving with Ray Serve from ray import serve @serve.deployment(num_replicas=4, route_prefix="/predict") class Model: def __call__(self, request): return {"result": "hello"} serve.run(Model.bind()) ``` ## Key Features - **Ray Core** — distributed tasks/actors with @ray.remote - **Ray Train** — distributed training (PyTorch, TensorFlow, XGBoost) - **Ray Tune** — hyperparameter search at scale (ASHA, BOHB, HyperOpt) - **Ray Serve** — production model serving with autoscaling - **Ray Data** — distributed data preprocessing for ML pipelines - **RLlib** — reinforcement learning library at industrial scale - **Autoscaler** — managed cluster expansion on AWS/GCP/Azure/Kubernetes - **Object store** — zero-copy shared memory for fast intra-cluster transfers ## Comparison with Similar Tools | Feature | Ray | Dask | Spark | Modal | Celery | |---|---|---|---|---|---| | Python-native | Yes | Yes | Wrapper | Yes | Yes | | ML libraries | Train/Tune/Serve/RLlib | dask-ml | MLlib | Custom | None | | Online serving | Yes (Ray Serve) | No | No | Yes | No | | Stateful actors | Yes | Limited | Limited | Limited | No | | Cluster management | Built-in autoscaler | Limited | Yarn/K8s | Managed | None | | Best For | AI/ML at scale | Pythonic data science | Big data ETL | Serverless GPUs | Background jobs | ## FAQ **Q: Ray vs Dask?** A: Dask is great for parallelizing pandas/NumPy workflows. Ray is broader: actors, distributed RL, online serving, training. For pure DataFrame work, Dask is simpler; for ML platforms, Ray is the standard. **Q: Ray vs Spark?** A: Spark dominates traditional big-data ETL. Ray dominates ML training and serving. Many platforms run both: Spark for upstream data prep, Ray for downstream training. **Q: Do I need Anyscale?** A: No — Ray is fully open source and runs on your own infrastructure (laptop, EC2, Kubernetes, KubeRay). Anyscale offers a managed service if you don't want to operate clusters. **Q: How does it scale?** A: From single-machine multiprocessing to thousands of nodes with the same code. The autoscaler talks to AWS/GCP/Azure or KubeRay (Kubernetes operator) to add/remove workers based on demand. ## Sources - GitHub: https://github.com/ray-project/ray - Docs: https://docs.ray.io - Company: Anyscale - License: Apache-2.0 --- Source: https://tokrepo.com/en/workflows/b0f2e5e4-37db-11f1-9bc6-00163e2b0d79 Author: Script Depot