Miller — Like awk, sed, cut, join, and sort for CSV, TSV, JSON
Miller (mlr) is a multi-purpose command-line tool for processing name-indexed data such as CSV, TSV, JSON, JSON Lines, and positionally-indexed records, blending awk-style expressions with pandas-like DataFrame operations.
What it is
Miller (mlr) is a command-line tool for processing structured data formats including CSV, TSV, JSON, and JSON Lines. It combines the functionality of awk, sed, cut, join, and sort into a single binary that understands named fields. Written in Go, Miller ships as a zero-dependency binary.
Miller targets data engineers, analysts, and developers who work with structured data files in the terminal. It bridges the gap between Unix text tools (which treat everything as unstructured text) and full data processing frameworks (which require writing programs).
How it saves time or tokens
Miller operates on named columns directly, eliminating the need to count field positions like awk. It reads and writes multiple formats in a single command, so you can ingest CSV and output JSON without a separate conversion step. Chaining operations with then creates data pipelines in one line. No programming language setup or library imports required.
How to use
- Install Miller via your package manager:
brew install milleron macOS orapt install milleron Debian/Ubuntu. - Run
mlr --csv head -n 5 data.csvto preview data, or pipe from stdin. - Chain operations with
then: filter, sort, group-by, join, and statistical aggregations.
Example
# Pretty-print a CSV file
mlr --icsv --opprint cat data.csv
# Filter rows and compute a new field
mlr --csv filter '$status == "active"' \
then put '$rate = $revenue / $visits' \
data.csv
# Convert CSV to JSON
mlr --icsv --ojson cat data.csv
# Group-by aggregation
mlr --csv stats1 -a mean,count -f revenue -g region data.csv
# Sort by a field
mlr --csv sort-within-groups -f region -nr revenue data.csv
Related on TokRepo
- Automation Tools — CLI tools for data processing and automation
- Coding AI Tools — Developer productivity tools
Common pitfalls
- Miller field names are case-sensitive. A CSV header 'Name' and 'name' are different fields. Check your headers with
mlr --csv head -n 1. - The
--fromflag is useful for reading from files when your shell has quoting conflicts with the filter expressions. - Miller v6 (Go rewrite) changed some command-line flag behavior from v5 (C version). Check the documentation if upgrading from an older version.
Frequently Asked Questions
awk treats input as positional fields separated by delimiters. Miller understands named columns from CSV/TSV/JSON headers, so you reference $column_name instead of $1, $2. Miller also handles quoting, escaping, and multi-line CSV fields correctly, which awk cannot.
Yes. Miller supports JSON, JSON Lines, CSV, TSV, DKVP, XTAB, and other formats for both input and output. You can mix formats freely, such as reading CSV and outputting JSON with --icsv --ojson.
Yes. Miller supports join operations between two files using the join verb. It handles inner joins, left joins, and right joins on named key fields, similar to SQL JOIN syntax.
Yes. The stats1 and stats2 verbs compute mean, median, standard deviation, min, max, count, and other statistics. You can group by fields for segmented analysis, similar to SQL GROUP BY.
Miller v6 is written in Go and processes data in streaming fashion without loading entire files into memory. It handles multi-gigabyte files efficiently. For very large datasets, Miller is significantly faster than Python pandas for simple transformations.
Citations (3)
- Miller GitHub— Miller processes CSV, TSV, JSON with awk-style expressions
- Miller Documentation— Miller v6 rewritten in Go with streaming data processing
- Miller Official Site— Structured data processing from the command line
Related on TokRepo
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.