Introduction
Rayon is the standard data parallelism library for Rust. It provides parallel iterators that look and feel exactly like standard Rust iterators — but distribute work across all available CPU cores automatically. The beauty of Rayon is that Rust compiler guarantees no data races, making parallelism safe by construction.
With over 13,000 GitHub stars, Rayon is used by ripgrep, Servo, Polars, and many performance-critical Rust applications. It implements work-stealing scheduling for efficient load balancing across threads.
What Rayon Does
Rayon provides ParallelIterator trait that mirrors the standard Iterator trait. When you call .par_iter() instead of .iter(), Rayon splits the work into tasks, distributes them across a thread pool using work-stealing, and collects results — all while the Rust borrow checker ensures thread safety.
Architecture Overview
[Your Code]
vec.par_iter().map(f).filter(g).sum()
|
[Rayon Parallel Iterator]
Splits data into chunks
|
[Work-Stealing Thread Pool]
Each thread takes a chunk
Idle threads steal from busy ones
|
+-------+-------+-------+-------+
| | | | |
[Core 1] [Core 2] [Core 3] [Core N]
Process Process Process Process
chunk 1 chunk 2 chunk 3 chunk N
|
[Collect Results]
Merge partial results
Return final valueSelf-Hosting & Configuration
use rayon::prelude::*;
use rayon::ThreadPoolBuilder;
// Custom thread pool
let pool = ThreadPoolBuilder::new()
.num_threads(4)
.build()
.unwrap();
pool.install(|| {
// Code here runs with 4 threads
let result: Vec<_> = data.par_iter()
.map(|item| expensive_computation(item))
.collect();
});
// Parallel join (fork-join pattern)
let (left, right) = rayon::join(
|| compute_left(),
|| compute_right(),
);
// Parallel sort variants
let mut data = vec![5, 2, 8, 1, 9];
data.par_sort(); // stable sort
data.par_sort_unstable(); // faster, unstable
data.par_sort_by(|a, b| b.cmp(a)); // custom comparator
data.par_sort_by_key(|x| x.abs()); // sort by key
// Parallel chunks processing
data.par_chunks(100).for_each(|chunk| {
process_batch(chunk);
});Key Features
- Drop-In Parallelism — replace .iter() with .par_iter()
- Zero Data Races — guaranteed by Rust type system
- Work-Stealing — efficient load balancing across threads
- Parallel Sort — par_sort, par_sort_unstable, par_sort_by
- Join — fork-join parallelism for independent computations
- Thread Pool — configurable global or custom thread pools
- Scope — structured parallelism with borrowed data
- Bridge — convert between sequential and parallel iterators
Comparison with Similar Tools
| Feature | Rayon | std::thread | tokio | crossbeam | OpenMP (C/C++) |
|---|---|---|---|---|---|
| Paradigm | Data parallelism | Manual threads | Async I/O | Concurrent data structures | Compiler directives |
| Safety | Compile-time | Compile-time | Compile-time | Compile-time | Runtime |
| Ease of Use | Very Easy | Moderate | Moderate | Moderate | Easy |
| Best For | CPU-bound parallel | Custom threading | I/O-bound async | Lock-free data | C/C++ parallelism |
| Scheduling | Work-stealing | Manual | Cooperative | Manual | Runtime |
FAQ
Q: When should I use Rayon vs Tokio? A: Rayon for CPU-bound parallelism (processing data, computation). Tokio for I/O-bound concurrency (network requests, file I/O). They solve different problems and can be used together.
Q: How much speedup can I expect? A: For embarrassingly parallel workloads (independent items), expect near-linear speedup with core count. For workloads with dependencies or small items, the overhead may reduce gains.
Q: Is Rayon always faster than sequential? A: No. For very small collections or cheap operations, the threading overhead exceeds the benefit. Rayon shines when each item requires significant computation.
Q: Can I control the number of threads? A: Yes. Use ThreadPoolBuilder to set num_threads, or set the RAYON_NUM_THREADS environment variable. The default is the number of logical CPU cores.
Sources
- GitHub: https://github.com/rayon-rs/rayon
- Documentation: https://docs.rs/rayon
- Created by Niko Matsakis and Josh Stone
- License: MIT / Apache-2.0