# cuDF — GPU-Accelerated DataFrame Library by NVIDIA RAPIDS

> cuDF is a GPU-accelerated DataFrame library from the NVIDIA RAPIDS suite that provides a pandas-like API for data manipulation at 10-100x the speed on NVIDIA GPUs.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

# cuDF — GPU-Accelerated DataFrame Library by NVIDIA RAPIDS

## Quick Use
```bash
pip install cudf-cu12
python -c "
import cudf
gdf = cudf.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
print(gdf.groupby('a').sum())
"
# Or use cudf.pandas for zero-code-change acceleration
python -m cudf.pandas your_script.py
```

## Introduction
cuDF is an open-source GPU DataFrame library that is part of the NVIDIA RAPIDS ecosystem. It provides a familiar pandas-like API while executing operations on NVIDIA GPUs, delivering dramatic speedups for data loading, transformation, and aggregation tasks common in data science and feature engineering workflows.

## What cuDF Does
- Accelerates DataFrame operations (filter, join, groupby, sort) on NVIDIA GPUs
- Provides a pandas-compatible API so existing code runs with minimal changes
- Reads and writes Parquet, CSV, ORC, JSON, and Apache Arrow formats on GPU
- Offers a `cudf.pandas` accelerator that automatically dispatches operations to GPU
- Integrates with Dask for multi-GPU and multi-node distributed processing

## Architecture Overview
cuDF stores columnar data in GPU memory using the Apache Arrow format. Operations are executed as CUDA kernels optimized for GPU parallelism. The library includes a JIT compiler that fuses custom UDFs into efficient GPU code. For multi-GPU workflows, cuDF integrates with Dask-cuDF to partition DataFrames across GPUs and coordinate shuffles. The `cudf.pandas` proxy layer intercepts pandas calls at runtime and routes supported operations to the GPU while falling back to pandas for unsupported ones.

## Self-Hosting & Configuration
- Install via pip: `pip install cudf-cu12` for CUDA 12 or use conda from the RAPIDS channel
- Requires an NVIDIA GPU with compute capability 7.0+ (Volta or newer)
- Use `cudf.pandas` as a drop-in accelerator: `python -m cudf.pandas script.py`
- Configure the RMM memory manager for custom GPU memory pool strategies
- Scale to multiple GPUs with Dask: `dask.dataframe.read_parquet()` using the cuDF backend

## Key Features
- 10-100x speedup over pandas for large-scale data manipulation
- `cudf.pandas` provides zero-code-change GPU acceleration for existing scripts
- Native Parquet and ORC readers that decompress and decode directly on GPU
- String processing, regex, and datetime operations fully GPU-accelerated
- Seamless interop with CuPy, CuML, and other RAPIDS libraries via `__cuda_array_interface__`

## Comparison with Similar Tools
- **pandas** — CPU-only; cuDF provides the same API with GPU acceleration
- **Polars** — Fast CPU DataFrame in Rust; cuDF leverages GPU parallelism for even larger speedups
- **PySpark** — Distributed CPU processing; cuDF + Dask provides GPU-accelerated distributed DataFrames
- **Modin** — Parallelizes pandas on CPU cores; cuDF parallelizes on GPU cores
- **Vaex** — Out-of-core CPU DataFrames; cuDF processes in-GPU-memory for lower latency

## FAQ
**Q: Do I need to rewrite my pandas code to use cuDF?**
A: No. Use `python -m cudf.pandas` to accelerate existing pandas scripts without code changes. For new code, the cuDF API mirrors pandas closely.

**Q: How much GPU memory do I need?**
A: Your dataset must fit in GPU memory. For larger-than-memory workloads, use Dask-cuDF to partition across multiple GPUs.

**Q: Can cuDF handle string and text data?**
A: Yes, cuDF provides GPU-accelerated string operations including regex, split, replace, and contains.

**Q: Which file formats are supported?**
A: Parquet, CSV, ORC, JSON, and Apache Arrow IPC, all with GPU-accelerated readers and writers.

## Sources
- https://github.com/rapidsai/cudf
- https://docs.rapids.ai/api/cudf/stable/

---
Source: https://tokrepo.com/en/workflows/cudf-gpu-accelerated-dataframe-library-nvidia-rapids-8fee0711
Author: AI Open Source