# Miller — Like awk, sed, cut, join, and sort for CSV, TSV, JSON > Miller (mlr) is a multi-purpose command-line tool for processing name-indexed data such as CSV, TSV, JSON, JSON Lines, and positionally-indexed records, blending awk-style expressions with pandas-like DataFrame operations. ## Install Save as a script file and run: # Miller — Like awk, sed, cut, join, and sort for CSV, TSV, JSON ## Quick Use ```bash brew install miller # Pretty-print a CSV mlr --icsv --opprint cat data.csv # Filter + compute mlr --csv filter '$status == "active"' then put '$rate = $revenue / $visits' data.csv ``` ## Introduction Miller (`mlr`) is a single-binary Go tool that lets you treat CSV/TSV/JSON as first-class structured data from the shell. Instead of juggling `cut`, `awk`, and `jq`, Miller provides a unified verb grammar — `cat`, `head`, `tail`, `filter`, `put`, `stats1`, `join`, `reshape` — that works across formats and converts between them with a flag change. ## What Miller Does - Reads/writes CSV, TSV, JSON, JSON Lines, PPRINT, NIDX, DKVP, Markdown. - Provides verbs (`filter`, `put`, `stats1`, `join`, `sort`, `tac`, `reshape wide→long`). - Supports a DSL with variables, functions, control flow, and regex. - Streams data row-by-row — handles files larger than RAM. - Operates as UNIX filter — composes naturally with pipes. ## Architecture Overview Miller parses the input stream into record objects (ordered maps of field → value), passes them through a verb chain, and emits them in the chosen output format. Verbs are stackable; the DSL compiles once and runs per record. There is no intermediate DataFrame — memory is constant for most operations except `sort`/`join`. ## Self-Hosting & Configuration - Install via Homebrew, apt, dnf, Chocolatey, or download a static Go binary from the GitHub releases page. - Zero config; behavior driven by flags: `--icsv --ojson` converts CSV→JSON. - Put reusable pipelines in `.mlrrc` to shorten repeated commands. - Can run as AWS Lambda layer for data-prep in serverless ETL. ## Key Features - One tool for CSV/TSV/JSON/DKVP/PPRINT — replaces 4–5 utilities. - Streaming architecture with constant memory for most verbs. - DSL rich enough for regex, dates, JSON paths, higher-order functions. - `tac`, `nest`, `unsparsify`, `reshape` cover edge-case transforms. - Written in Go: single static binary, no runtime dependencies. ## Comparison with Similar Tools - **csvkit** — Python-based, more command-per-verb; slower on big files. - **xsv** — Rust CSV tool; very fast but CSV-only and no DSL. - **jq** — JSON-only; unmatched for JSON but cannot read CSV. - **q** — runs SQL over CSV/TSV; great for SQL fans but no streaming reshape. - **duckdb CLI** — columnar SQL; heavier for small one-off pipelines. ## FAQ **Q: Can Miller handle multi-GB files?** A: Yes, streaming verbs use constant memory. `sort`/`join` buffer. **Q: Is the DSL Turing-complete?** A: Effectively yes — variables, functions, loops, conditionals. **Q: Will it infer column types?** A: Numbers auto-typed in arithmetic; strings otherwise. Use `asserting_int` to enforce. **Q: Does Miller support Parquet?** A: Not natively — pair with duckdb CLI or convert via `mlr --ocsv`. ## Sources - https://github.com/johnkerl/miller - https://miller.readthedocs.io --- Source: https://tokrepo.com/en/workflows/bc103a0c-389d-11f1-9bc6-00163e2b0d79 Author: Script Depot