SkillsApr 15, 2026·3 min read

Miller — Like awk, sed, cut, join, and sort for CSV, TSV, JSON

Miller (mlr) is a multi-purpose command-line tool for processing name-indexed data such as CSV, TSV, JSON, JSON Lines, and positionally-indexed records, blending awk-style expressions with pandas-like DataFrame operations.

Script Depot · Community

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Established

Entrypoint

Miller guide

Direct install command

npx -y tokrepo@latest install bc103a0c-389d-11f1-9bc6-00163e2b0d79 --target codex

Run after dry-run confirms the install plan.

TL;DR

Miller (mlr) processes CSV, TSV, JSON, and JSON Lines from the command line with awk-style filtering, sorting, joining, and statistical operations.

§01

What it is

Miller (mlr) is a command-line tool for processing structured data formats including CSV, TSV, JSON, and JSON Lines. It combines the functionality of awk, sed, cut, join, and sort into a single binary that understands named fields. Written in Go, Miller ships as a zero-dependency binary.

Miller targets data engineers, analysts, and developers who work with structured data files in the terminal. It bridges the gap between Unix text tools (which treat everything as unstructured text) and full data processing frameworks (which require writing programs).

§02

How it saves time or tokens

Miller operates on named columns directly, eliminating the need to count field positions like awk. It reads and writes multiple formats in a single command, so you can ingest CSV and output JSON without a separate conversion step. Chaining operations with then creates data pipelines in one line. No programming language setup or library imports required.

§03

How to use

Install Miller via your package manager: brew install miller on macOS or apt install miller on Debian/Ubuntu.
Run mlr --csv head -n 5 data.csv to preview data, or pipe from stdin.
Chain operations with then: filter, sort, group-by, join, and statistical aggregations.

§04

Example

# Pretty-print a CSV file
mlr --icsv --opprint cat data.csv

# Filter rows and compute a new field
mlr --csv filter '$status == "active"' \
  then put '$rate = $revenue / $visits' \
  data.csv

# Convert CSV to JSON
mlr --icsv --ojson cat data.csv

# Group-by aggregation
mlr --csv stats1 -a mean,count -f revenue -g region data.csv

# Sort by a field
mlr --csv sort-within-groups -f region -nr revenue data.csv

§05

Related on TokRepo

Automation Tools — CLI tools for data processing and automation
Coding AI Tools — Developer productivity tools

§06

Common pitfalls

Miller field names are case-sensitive. A CSV header 'Name' and 'name' are different fields. Check your headers with mlr --csv head -n 1.
The --from flag is useful for reading from files when your shell has quoting conflicts with the filter expressions.
Miller v6 (Go rewrite) changed some command-line flag behavior from v5 (C version). Check the documentation if upgrading from an older version.

Frequently Asked Questions

How does Miller compare to awk for CSV processing?+

awk treats input as positional fields separated by delimiters. Miller understands named columns from CSV/TSV/JSON headers, so you reference $column_name instead of $1, $2. Miller also handles quoting, escaping, and multi-line CSV fields correctly, which awk cannot.

Can Miller handle JSON input and output?+

Yes. Miller supports JSON, JSON Lines, CSV, TSV, DKVP, XTAB, and other formats for both input and output. You can mix formats freely, such as reading CSV and outputting JSON with --icsv --ojson.

Does Miller support SQL-like joins?+

Yes. Miller supports join operations between two files using the join verb. It handles inner joins, left joins, and right joins on named key fields, similar to SQL JOIN syntax.

Can Miller do statistical aggregations?+

Yes. The stats1 and stats2 verbs compute mean, median, standard deviation, min, max, count, and other statistics. You can group by fields for segmented analysis, similar to SQL GROUP BY.

Is Miller fast enough for large files?+

Miller v6 is written in Go and processes data in streaming fashion without loading entire files into memory. It handles multi-gigabyte files efficiently. For very large datasets, Miller is significantly faster than Python pandas for simple transformations.

Citations (3)

Miller GitHub— Miller processes CSV, TSV, JSON with awk-style expressions
Miller Documentation— Miller v6 rewritten in Go with streaming data processing
Miller Official Site— Structured data processing from the command line

Related on TokRepo

Automation tools Coding tools Featured workflows

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

TerminusDB — Document Graph Database with Git-Like Versioning

TerminusDB is a document graph database that versions your data like Git. It stores JSON documents with graph relationships, enabling branch, merge, diff, and time-travel operations on your entire dataset.

Skills

Script Depot

tokenu — du-like Token Counter for Repos

tokenu is a du-like CLI to measure token usage per file/dir for LLM context planning; verified 59★ with `npx tokenu .` and JSON output for agents.

SkillsCLI Tools

Script Depot

Drizzle ORM — TypeScript SQL That Feels Like Code

Type-safe TypeScript ORM with SQL-like syntax. Zero overhead, serverless-ready, supports PostgreSQL, MySQL, SQLite. Schema as code with automatic migrations. 28,000+ GitHub stars.

Skills

Script Depot

Cython — Write C Extensions for Python Using Python-Like Syntax

Cython is an optimizing static compiler that translates Python-like code into C, producing extension modules that run at native speed. It is used to build high-performance libraries and to wrap existing C/C++ code for Python access.

Skills

Script Depot