# xsv — Fast CSV Toolkit Written in Rust

> xsv is a blazing-fast command-line toolkit for working with CSV data. It provides indexing, slicing, searching, joining, aggregation, and statistics — processing millions of rows per second for data analysis, ETL pipelines, and CSV manipulation.

## Install

Save in your project root:

# xsv — Fast CSV Toolkit Written in Rust

## Quick Use
```bash
# Install xsv
brew install xsv
# Or: cargo install xsv

# View CSV structure
xsv headers data.csv

# Show first 10 rows
xsv slice -l 10 data.csv | xsv table

# Select specific columns
xsv select name,email,age data.csv

# Search/filter rows
xsv search -s status "active" data.csv

# Statistics for all columns
xsv stats data.csv | xsv table

# Sort by a column
xsv sort -s revenue -R data.csv
```

## Introduction
xsv is a command-line toolkit for CSV that does what pandas does — but at the speed of Rust and without loading data into memory. It processes CSV files with millions of rows in seconds, providing operations for selection, filtering, joining, aggregation, and statistics.

With over 11,000 GitHub stars, xsv was created by Andrew Gallant (also the creator of ripgrep). It is the go-to tool for anyone who works with CSV data in the terminal and needs performance that awk, cut, and Python scripts cannot match.

## What xsv Does
xsv provides a suite of subcommands for CSV manipulation: headers (show column names), select (pick columns), search (filter rows by regex), sort, join (SQL-like joins between CSVs), stats (column statistics), frequency (value distributions), and more — all optimized for speed with streaming processing.

## Architecture Overview
```
[CSV Input]
Stdin, file, or multiple files
        |
   [xsv Subcommands]
+-------+-------+-------+
|       |       |       |
[select] [search] [stats]
Pick     Filter   Min, max,
columns  by regex mean, stdev

[sort]   [join]   [frequency]
Order    SQL-like  Value
by column inner/   distributions
         outer join

[slice]  [split]  [fmt]
Row      Split    Reformat
ranges   into     delimiter
         chunks   alignment
        |
   [Streaming Processing]
   Processes rows without
   loading entire file
   into memory
        |
[CSV Output]
Stdout, file, or pipe
```

## Self-Hosting & Configuration
```bash
# Data exploration workflow

# 1. Understand the data
xsv headers sales.csv
# date,product,category,revenue,quantity,region

xsv stats sales.csv | xsv table
# Shows type, min, max, mean, stddev for each column

# 2. Filter and select
xsv search -s region "US" sales.csv | xsv select product,revenue,quantity > us_sales.csv

# 3. Sort and slice
xsv sort -s revenue -R sales.csv | xsv slice -l 20 | xsv table
# Top 20 rows by revenue

# 4. Frequency analysis
xsv frequency -s category sales.csv | xsv table
# Shows value counts for category column

# 5. Join two CSVs
xsv join product sales.csv product_id products.csv > enriched.csv

# 6. Split large file
xsv split -s 10000 output_dir/ large_file.csv
# Creates chunks of 10,000 rows each

# 7. Count rows
xsv count sales.csv

# 8. Index for faster operations
xsv index sales.csv  # creates sales.csv.idx
xsv slice -i 1000000 -l 100 sales.csv  # instant random access

# Pipeline example
xsv search -s status "completed" orders.csv \
  | xsv select customer_id,amount \
  | xsv sort -s amount -R \
  | xsv slice -l 10 \
  | xsv table
```

## Key Features
- **Blazing Fast** — processes millions of rows per second in Rust
- **Streaming** — works with files larger than available RAM
- **Select** — pick columns by name or index
- **Search** — filter rows by regex on any column
- **Sort** — sort by any column (numeric or lexicographic)
- **Join** — inner, outer, left, right joins between CSV files
- **Stats** — min, max, mean, median, stddev for all columns
- **Frequency** — value distribution counts for categorical columns

## Comparison with Similar Tools
| Feature | xsv | csvkit | Miller (mlr) | cut + awk | pandas (Python) |
|---|---|---|---|---|---|
| Language | Rust | Python | C | C (coreutils) | Python |
| Speed | Very Fast | Slow | Fast | Moderate | Moderate |
| Memory | Streaming | In-memory | Streaming | Streaming | In-memory |
| CSV + JSON | CSV only | CSV + more | CSV + JSON | Text only | Any format |
| Statistics | Built-in | Via csvstat | Built-in | Manual | Built-in |
| Joins | Yes | Yes | Yes | No | Yes |
| Best For | Large CSV processing | Python users | Multi-format | Simple tasks | Full analysis |

## FAQ
**Q: xsv vs Miller (mlr) — which should I choose?**
A: xsv for pure CSV processing with maximum speed. Miller for multi-format support (CSV, JSON, JSONL) and more transformation capabilities. xsv is faster; Miller is more versatile.

**Q: Can xsv handle files larger than RAM?**
A: Yes. xsv uses streaming processing for most operations. For operations that need random access (like sort), create an index first with "xsv index".

**Q: How do I change the delimiter?**
A: Use -d flag: "xsv stats -d '\t' data.tsv" for tab-separated files. Output delimiter is set with --output-delimiter.

**Q: Can xsv replace pandas for data analysis?**
A: For simple operations (filter, select, sort, join, stats), xsv is faster and uses less memory. For complex analysis (pivot tables, groupby with custom aggregations, plotting), pandas is more capable.

## Sources
- GitHub: https://github.com/BurntSushi/xsv
- Created by Andrew Gallant (BurntSushi, also created ripgrep)
- License: Unlicense / MIT

---
Source: https://tokrepo.com/en/workflows/82f0e8a4-3745-11f1-9bc6-00163e2b0d79
Author: AI Open Source