Introduction
Miller (mlr) is a single-binary Go tool that lets you treat CSV/TSV/JSON as first-class structured data from the shell. Instead of juggling cut, awk, and jq, Miller provides a unified verb grammar — cat, head, tail, filter, put, stats1, join, reshape — that works across formats and converts between them with a flag change.
What Miller Does
- Reads/writes CSV, TSV, JSON, JSON Lines, PPRINT, NIDX, DKVP, Markdown.
- Provides verbs (
filter,put,stats1,join,sort,tac,reshape wide→long). - Supports a DSL with variables, functions, control flow, and regex.
- Streams data row-by-row — handles files larger than RAM.
- Operates as UNIX filter — composes naturally with pipes.
Architecture Overview
Miller parses the input stream into record objects (ordered maps of field → value), passes them through a verb chain, and emits them in the chosen output format. Verbs are stackable; the DSL compiles once and runs per record. There is no intermediate DataFrame — memory is constant for most operations except sort/join.
Self-Hosting & Configuration
- Install via Homebrew, apt, dnf, Chocolatey, or download a static Go binary from the GitHub releases page.
- Zero config; behavior driven by flags:
--icsv --ojsonconverts CSV→JSON. - Put reusable pipelines in
.mlrrcto shorten repeated commands. - Can run as AWS Lambda layer for data-prep in serverless ETL.
Key Features
- One tool for CSV/TSV/JSON/DKVP/PPRINT — replaces 4–5 utilities.
- Streaming architecture with constant memory for most verbs.
- DSL rich enough for regex, dates, JSON paths, higher-order functions.
tac,nest,unsparsify,reshapecover edge-case transforms.- Written in Go: single static binary, no runtime dependencies.
Comparison with Similar Tools
- csvkit — Python-based, more command-per-verb; slower on big files.
- xsv — Rust CSV tool; very fast but CSV-only and no DSL.
- jq — JSON-only; unmatched for JSON but cannot read CSV.
- q — runs SQL over CSV/TSV; great for SQL fans but no streaming reshape.
- duckdb CLI — columnar SQL; heavier for small one-off pipelines.
FAQ
Q: Can Miller handle multi-GB files?
A: Yes, streaming verbs use constant memory. sort/join buffer.
Q: Is the DSL Turing-complete? A: Effectively yes — variables, functions, loops, conditionals.
Q: Will it infer column types?
A: Numbers auto-typed in arithmetic; strings otherwise. Use asserting_int to enforce.
Q: Does Miller support Parquet?
A: Not natively — pair with duckdb CLI or convert via mlr --ocsv.