# Miller — Like awk, sed, cut, join, and sort for CSV, TSV, JSON

> Miller (mlr) is a multi-purpose command-line tool for processing name-indexed data such as CSV, TSV, JSON, JSON Lines, and positionally-indexed records, blending awk-style expressions with pandas-like DataFrame operations.

## Install

Save as a script file and run:

# Miller — Like awk, sed, cut, join, and sort for CSV, TSV, JSON

## Quick Use
```bash
brew install miller
# Pretty-print a CSV
mlr --icsv --opprint cat data.csv
# Filter + compute
mlr --csv filter '$status == "active"' then put '$rate = $revenue / $visits' data.csv
```

## Introduction
Miller (`mlr`) is a single-binary Go tool that lets you treat CSV/TSV/JSON as first-class structured data from the shell. Instead of juggling `cut`, `awk`, and `jq`, Miller provides a unified verb grammar — `cat`, `head`, `tail`, `filter`, `put`, `stats1`, `join`, `reshape` — that works across formats and converts between them with a flag change.

## What Miller Does
- Reads/writes CSV, TSV, JSON, JSON Lines, PPRINT, NIDX, DKVP, Markdown.
- Provides verbs (`filter`, `put`, `stats1`, `join`, `sort`, `tac`, `reshape wide→long`).
- Supports a DSL with variables, functions, control flow, and regex.
- Streams data row-by-row — handles files larger than RAM.
- Operates as UNIX filter — composes naturally with pipes.

## Architecture Overview
Miller parses the input stream into record objects (ordered maps of field → value), passes them through a verb chain, and emits them in the chosen output format. Verbs are stackable; the DSL compiles once and runs per record. There is no intermediate DataFrame — memory is constant for most operations except `sort`/`join`.

## Self-Hosting & Configuration
- Install via Homebrew, apt, dnf, Chocolatey, or download a static Go binary from the GitHub releases page.
- Zero config; behavior driven by flags: `--icsv --ojson` converts CSV→JSON.
- Put reusable pipelines in `.mlrrc` to shorten repeated commands.
- Can run as AWS Lambda layer for data-prep in serverless ETL.

## Key Features
- One tool for CSV/TSV/JSON/DKVP/PPRINT — replaces 4–5 utilities.
- Streaming architecture with constant memory for most verbs.
- DSL rich enough for regex, dates, JSON paths, higher-order functions.
- `tac`, `nest`, `unsparsify`, `reshape` cover edge-case transforms.
- Written in Go: single static binary, no runtime dependencies.

## Comparison with Similar Tools
- **csvkit** — Python-based, more command-per-verb; slower on big files.
- **xsv** — Rust CSV tool; very fast but CSV-only and no DSL.
- **jq** — JSON-only; unmatched for JSON but cannot read CSV.
- **q** — runs SQL over CSV/TSV; great for SQL fans but no streaming reshape.
- **duckdb CLI** — columnar SQL; heavier for small one-off pipelines.

## FAQ
**Q: Can Miller handle multi-GB files?**
A: Yes, streaming verbs use constant memory. `sort`/`join` buffer.

**Q: Is the DSL Turing-complete?**
A: Effectively yes — variables, functions, loops, conditionals.

**Q: Will it infer column types?**
A: Numbers auto-typed in arithmetic; strings otherwise. Use `asserting_int` to enforce.

**Q: Does Miller support Parquet?**
A: Not natively — pair with duckdb CLI or convert via `mlr --ocsv`.

## Sources
- https://github.com/johnkerl/miller
- https://miller.readthedocs.io

---
Source: https://tokrepo.com/en/workflows/bc103a0c-389d-11f1-9bc6-00163e2b0d79
Author: Script Depot