ScriptsApr 15, 2026·3 min read

Miller — Like awk, sed, cut, join, and sort for CSV, TSV, JSON

Miller (mlr) is a multi-purpose command-line tool for processing name-indexed data such as CSV, TSV, JSON, JSON Lines, and positionally-indexed records, blending awk-style expressions with pandas-like DataFrame operations.

TL;DR
Miller (mlr) processes CSV, TSV, JSON, and JSON Lines from the command line with awk-style filtering, sorting, joining, and statistical operations.
§01

What it is

Miller (mlr) is a command-line tool for processing structured data formats including CSV, TSV, JSON, and JSON Lines. It combines the functionality of awk, sed, cut, join, and sort into a single binary that understands named fields. Written in Go, Miller ships as a zero-dependency binary.

Miller targets data engineers, analysts, and developers who work with structured data files in the terminal. It bridges the gap between Unix text tools (which treat everything as unstructured text) and full data processing frameworks (which require writing programs).

§02

How it saves time or tokens

Miller operates on named columns directly, eliminating the need to count field positions like awk. It reads and writes multiple formats in a single command, so you can ingest CSV and output JSON without a separate conversion step. Chaining operations with then creates data pipelines in one line. No programming language setup or library imports required.

§03

How to use

  1. Install Miller via your package manager: brew install miller on macOS or apt install miller on Debian/Ubuntu.
  2. Run mlr --csv head -n 5 data.csv to preview data, or pipe from stdin.
  3. Chain operations with then: filter, sort, group-by, join, and statistical aggregations.
§04

Example

# Pretty-print a CSV file
mlr --icsv --opprint cat data.csv

# Filter rows and compute a new field
mlr --csv filter '$status == "active"' \
  then put '$rate = $revenue / $visits' \
  data.csv

# Convert CSV to JSON
mlr --icsv --ojson cat data.csv

# Group-by aggregation
mlr --csv stats1 -a mean,count -f revenue -g region data.csv

# Sort by a field
mlr --csv sort-within-groups -f region -nr revenue data.csv
§05

Related on TokRepo

§06

Common pitfalls

  • Miller field names are case-sensitive. A CSV header 'Name' and 'name' are different fields. Check your headers with mlr --csv head -n 1.
  • The --from flag is useful for reading from files when your shell has quoting conflicts with the filter expressions.
  • Miller v6 (Go rewrite) changed some command-line flag behavior from v5 (C version). Check the documentation if upgrading from an older version.

Frequently Asked Questions

How does Miller compare to awk for CSV processing?+

awk treats input as positional fields separated by delimiters. Miller understands named columns from CSV/TSV/JSON headers, so you reference $column_name instead of $1, $2. Miller also handles quoting, escaping, and multi-line CSV fields correctly, which awk cannot.

Can Miller handle JSON input and output?+

Yes. Miller supports JSON, JSON Lines, CSV, TSV, DKVP, XTAB, and other formats for both input and output. You can mix formats freely, such as reading CSV and outputting JSON with --icsv --ojson.

Does Miller support SQL-like joins?+

Yes. Miller supports join operations between two files using the join verb. It handles inner joins, left joins, and right joins on named key fields, similar to SQL JOIN syntax.

Can Miller do statistical aggregations?+

Yes. The stats1 and stats2 verbs compute mean, median, standard deviation, min, max, count, and other statistics. You can group by fields for segmented analysis, similar to SQL GROUP BY.

Is Miller fast enough for large files?+

Miller v6 is written in Go and processes data in streaming fashion without loading entire files into memory. It handles multi-gigabyte files efficiently. For very large datasets, Miller is significantly faster than Python pandas for simple transformations.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets