Introduction
grex (generate regex) is a Rust CLI that infers a regex from example strings. Instead of writing ^\d+\.\d+\.\d+$ by hand and debugging character classes, you hand grex some example matches and let it build the pattern.
With over 8,000 GitHub stars, grex is a favorite for data engineers, DevOps, and anyone who processes logs or text. Its output is often a great starting point — you can tighten or loosen the generated regex manually from there.
What grex Does
grex reads a list of strings, finds the shortest regex that matches all of them (and ideally only them), and prints it. Flags let you fine-tune: escape specific chars, add Unicode-aware character classes, enable word boundaries, or add fuzzy matching thresholds.
Architecture Overview
Input strings
|
[grex parser]
|
[Character class inference]
detect digits -> \d, letters -> \w
detect case classes, Unicode categories
|
[Quantifier detection]
find repeating patterns {min,max}
|
[Alternation optimization]
shared prefixes/suffixes
|
Output regexSelf-Hosting & Configuration
# Use character class shortcuts
grex -d "2024-01-15" "2024-02-28" "2024-12-31"
# Output: ^2024\-(?:0[1-9]|1[0-2])\-(?:0[1-9]|[12]\d|3[01])$
# Unicode-aware for non-ASCII input
grex -u "café" "naïve" "résumé"
# Output includes Unicode property escapes
# Add word boundaries
grex --word "error" "warning" "fatal"
# Output: ^\b(?:error|fatal|warning)\b$
# Anchors off — regex may match inside longer strings
grex --no-anchors "hello" "world"
# Output: (?:hello|world)
# JSON-formatted output for scripting
grex -j "foo" "bar"# Pipeline: generate regex and use with ripgrep
grex $(cat error_patterns.txt) | xargs -I REGEX rg "REGEX" logs/Key Features
- Inference from examples — paste data, get regex
- Character class detection — \d, \w, \s, Unicode properties
- Quantifier finding — converts "aaa" "aa" to "a{2,3}"
- Alternation optimization — shared prefix/suffix factoring
- Case-insensitive mode —
-ifor case folding - Unicode support — handles non-ASCII input natively
- JSON output —
-jfor programmatic consumption - Rust binary — fast, single-file, no dependencies
Comparison with Similar Tools
| Feature | grex | regex101.com | LLM prompt | Debuggex | regexr |
|---|---|---|---|---|---|
| Infers from examples | Yes | No | Yes (imprecise) | No | No |
| Offline | Yes | No (web) | No | No (web) | No (web) |
| Customizable | Flags | Visual editor | Prompt engineering | Visual editor | Visual editor |
| Deterministic | Yes | N/A | No | N/A | N/A |
| Best For | Automated regex generation | Learning regex | Quick drafts | Visual debugging | Visual debugging |
FAQ
Q: Will grex always produce the best regex?
A: No — it produces a correct regex that matches your examples. For production use, review and tighten (e.g., replace .+ with a specific class) based on what else could match.
Q: Can it handle thousands of examples? A: Yes. It scales linearly with input size. For very large input, pre-deduplicate to speed up.
Q: Does grex generate PCRE or POSIX regex? A: Rust-regex / PCRE-compatible by default. Most modern tools (rg, PCRE, Python re, Go regexp) accept the output. For strict POSIX-only targets, test carefully.
Q: grex vs asking an LLM? A: LLMs can be wrong subtly; grex is deterministic and correct for the inputs. Use grex for automated pipelines; use LLMs when you also want explanations and edge-case coverage.
Sources
- GitHub: https://github.com/pemistahl/grex
- License: Apache-2.0