What is xsv — Fast CSV Toolkit Written in Rust?

xsv is a blazing-fast command-line toolkit for working with CSV data. It provides indexing, slicing, searching, joining, aggregation, and statistics — processing millions of rows per second for data analysis, ETL pipelines, and CSV manipulation.

Is xsv — Fast CSV Toolkit Written in Rust free to use?

Yes. xsv — Fast CSV Toolkit Written in Rust is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install xsv — Fast CSV Toolkit Written in Rust?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

xsv — Fast CSV Toolkit Written in Rust

Introduction

xsv is a command-line toolkit for CSV that does what pandas does — but at the speed of Rust and without loading data into memory. It processes CSV files with millions of rows in seconds, providing operations for selection, filtering, joining, aggregation, and statistics.

With over 11,000 GitHub stars, xsv was created by Andrew Gallant (also the creator of ripgrep). It is the go-to tool for anyone who works with CSV data in the terminal and needs performance that awk, cut, and Python scripts cannot match.

What xsv Does

xsv provides a suite of subcommands for CSV manipulation: headers (show column names), select (pick columns), search (filter rows by regex), sort, join (SQL-like joins between CSVs), stats (column statistics), frequency (value distributions), and more — all optimized for speed with streaming processing.

Architecture Overview

[CSV Input]
Stdin, file, or multiple files
        |
   [xsv Subcommands]
+-------+-------+-------+
|       |       |       |
[select] [search] [stats]
Pick     Filter   Min, max,
columns  by regex mean, stdev

[sort]   [join]   [frequency]
Order    SQL-like  Value
by column inner/   distributions
         outer join

[slice]  [split]  [fmt]
Row      Split    Reformat
ranges   into     delimiter
         chunks   alignment
        |
   [Streaming Processing]
   Processes rows without
   loading entire file
   into memory
        |
[CSV Output]
Stdout, file, or pipe

Self-Hosting & Configuration

# Data exploration workflow

# 1. Understand the data
xsv headers sales.csv
# date,product,category,revenue,quantity,region

xsv stats sales.csv | xsv table
# Shows type, min, max, mean, stddev for each column

# 2. Filter and select
xsv search -s region "US" sales.csv | xsv select product,revenue,quantity > us_sales.csv

# 3. Sort and slice
xsv sort -s revenue -R sales.csv | xsv slice -l 20 | xsv table
# Top 20 rows by revenue

# 4. Frequency analysis
xsv frequency -s category sales.csv | xsv table
# Shows value counts for category column

# 5. Join two CSVs
xsv join product sales.csv product_id products.csv > enriched.csv

# 6. Split large file
xsv split -s 10000 output_dir/ large_file.csv
# Creates chunks of 10,000 rows each

# 7. Count rows
xsv count sales.csv

# 8. Index for faster operations
xsv index sales.csv  # creates sales.csv.idx
xsv slice -i 1000000 -l 100 sales.csv  # instant random access

# Pipeline example
xsv search -s status "completed" orders.csv \
  | xsv select customer_id,amount \
  | xsv sort -s amount -R \
  | xsv slice -l 10 \
  | xsv table

Key Features

Blazing Fast — processes millions of rows per second in Rust
Streaming — works with files larger than available RAM
Select — pick columns by name or index
Search — filter rows by regex on any column
Sort — sort by any column (numeric or lexicographic)
Join — inner, outer, left, right joins between CSV files
Stats — min, max, mean, median, stddev for all columns
Frequency — value distribution counts for categorical columns

Comparison with Similar Tools

Feature	xsv	csvkit	Miller (mlr)	cut + awk	pandas (Python)
Language	Rust	Python	C	C (coreutils)	Python
Speed	Very Fast	Slow	Fast	Moderate	Moderate
Memory	Streaming	In-memory	Streaming	Streaming	In-memory
CSV + JSON	CSV only	CSV + more	CSV + JSON	Text only	Any format
Statistics	Built-in	Via csvstat	Built-in	Manual	Built-in
Joins	Yes	Yes	Yes	No	Yes
Best For	Large CSV processing	Python users	Multi-format	Simple tasks	Full analysis

FAQ

Q: xsv vs Miller (mlr) — which should I choose? A: xsv for pure CSV processing with maximum speed. Miller for multi-format support (CSV, JSON, JSONL) and more transformation capabilities. xsv is faster; Miller is more versatile.

Q: Can xsv handle files larger than RAM? A: Yes. xsv uses streaming processing for most operations. For operations that need random access (like sort), create an index first with "xsv index".

Q: How do I change the delimiter? A: Use -d flag: "xsv stats -d '\t' data.tsv" for tab-separated files. Output delimiter is set with --output-delimiter.

Q: Can xsv replace pandas for data analysis? A: For simple operations (filter, select, sort, join, stats), xsv is faster and uses less memory. For complex analysis (pivot tables, groupby with custom aggregations, plotting), pandas is more capable.

Sources

GitHub: https://github.com/BurntSushi/xsv
Created by Andrew Gallant (BurntSushi, also created ripgrep)
License: Unlicense / MIT

xsv — Fast CSV Toolkit Written in Rust

先拿来用，再决定要不要深挖

Introduction

What xsv Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

讨论

相关资产

Reqwest — Ergonomic HTTP Client for Rust

bottom — Beautiful Cross-Platform System Monitor in Rust

Wasmtime — Fast Secure WebAssembly Runtime