What is CuPy — NumPy and SciPy for GPU?

Open-source array library accelerated with NVIDIA CUDA, providing a drop-in replacement for NumPy and SciPy on the GPU.

Is CuPy — NumPy and SciPy for GPU free to use?

Yes. CuPy — NumPy and SciPy for GPU is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install CuPy — NumPy and SciPy for GPU?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

CuPy — NumPy and SciPy for GPU

Introduction

CuPy is an open-source Python library that mirrors the NumPy and SciPy APIs while executing operations on NVIDIA GPUs via CUDA. By changing a single import line, existing NumPy code can leverage GPU acceleration with minimal refactoring. CuPy is maintained by Preferred Networks and used in scientific computing, deep learning preprocessing, and signal processing workloads.

What CuPy Does

Provides GPU-backed ndarray compatible with NumPy array operations
Implements hundreds of NumPy and SciPy functions including linear algebra, FFT, and sparse matrices
Supports custom CUDA kernels through ElementwiseKernel and RawKernel APIs
Integrates with cuDNN, cuBLAS, cuSOLVER, cuSPARSE, and NCCL for optimized routines
Offers interoperability with PyTorch, TensorFlow, and DLPack tensors

Architecture Overview

CuPy allocates device memory through a pooled allocator that reduces CUDA malloc overhead. Array operations dispatch to pre-compiled CUDA kernels or call into NVIDIA library routines. A JIT compilation cache stores custom kernels so they compile only once per session. The library follows the Python Array API standard, making it compatible with array-agnostic code written for NumPy.

Self-Hosting & Configuration

Install the wheel matching your CUDA version: pip install cupy-cuda12x
Set CUPY_CACHE_DIR to persist JIT-compiled kernels across runs
Use cupy.cuda.Device(n) to select which GPU to target
Configure the memory pool with cupy.get_default_memory_pool().set_limit(size=4*1024**3) to cap usage
For multi-GPU work, combine CuPy with mpi4py or NCCL communicators

Key Features

Drop-in NumPy replacement requiring only an import change
Routinely achieves 10-100x speedups over CPU NumPy on large arrays
Supports CUDA Graphs for reduced kernel-launch overhead
Works with AMD ROCm GPUs through the HIP backend
Actively maintained with regular releases tracking CUDA toolkit versions

Comparison with Similar Tools

NumPy — CPU-only; CuPy mirrors its API on the GPU
JAX — JIT-compiled with autograd focus; CuPy is closer to a direct NumPy port
PyTorch Tensors — deep learning-oriented; CuPy targets general scientific computing
RAPIDS cuDF — GPU DataFrames built on top of CuPy for tabular data
Numba — JIT-compiles Python loops to GPU; CuPy provides pre-built array ops

FAQ

Q: Can I use CuPy without NVIDIA hardware? A: CuPy requires a CUDA-capable GPU by default, but an experimental ROCm backend supports AMD GPUs.

Q: Does CuPy work in Jupyter notebooks? A: Yes. Install the appropriate cupy wheel, and GPU arrays display just like NumPy arrays in cells.

Q: How does CuPy handle data transfer between CPU and GPU? A: Use cupy.asarray(np_array) to send data to GPU and cupy.asnumpy(cp_array) to bring it back.

Q: Is CuPy compatible with the latest CUDA versions? A: CuPy ships wheels for each major CUDA release. Check the installation guide for your CUDA version.

CuPy — NumPy and SciPy for GPU

Ready-to-run agent install

Introduction

What CuPy Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

LoRAX — Multi-LoRA Inference Server for Fine-Tuned LLMs

SciPy — Fundamental Algorithms for Scientific Computing

NumPy — The Fundamental Package for Scientific Computing

scikit-image — Image Processing Algorithms for Python