Configs2026年4月14日·1 分钟阅读

grex — A Command-Line Tool That Generates Regular Expressions from Examples

grex turns a list of example strings into a matching regex. Paste the patterns you want to match, get back a tested regex — no more staring at a regex cheat sheet for 30 minutes.

Introduction

grex (generate regex) is a Rust CLI that infers a regex from example strings. Instead of writing ^\d+\.\d+\.\d+$ by hand and debugging character classes, you hand grex some example matches and let it build the pattern.

With over 8,000 GitHub stars, grex is a favorite for data engineers, DevOps, and anyone who processes logs or text. Its output is often a great starting point — you can tighten or loosen the generated regex manually from there.

What grex Does

grex reads a list of strings, finds the shortest regex that matches all of them (and ideally only them), and prints it. Flags let you fine-tune: escape specific chars, add Unicode-aware character classes, enable word boundaries, or add fuzzy matching thresholds.

Architecture Overview

Input strings
    |
[grex parser]
    |
[Character class inference]
   detect digits -> \d, letters -> \w
   detect case classes, Unicode categories
    |
[Quantifier detection]
   find repeating patterns {min,max}
    |
[Alternation optimization]
   shared prefixes/suffixes
    |
Output regex

Self-Hosting & Configuration

# Use character class shortcuts
grex -d "2024-01-15" "2024-02-28" "2024-12-31"
# Output: ^2024\-(?:0[1-9]|1[0-2])\-(?:0[1-9]|[12]\d|3[01])$

# Unicode-aware for non-ASCII input
grex -u "café" "naïve" "résumé"
# Output includes Unicode property escapes

# Add word boundaries
grex --word "error" "warning" "fatal"
# Output: ^\b(?:error|fatal|warning)\b$

# Anchors off — regex may match inside longer strings
grex --no-anchors "hello" "world"
# Output: (?:hello|world)

# JSON-formatted output for scripting
grex -j "foo" "bar"
# Pipeline: generate regex and use with ripgrep
grex $(cat error_patterns.txt) | xargs -I REGEX rg "REGEX" logs/

Key Features

  • Inference from examples — paste data, get regex
  • Character class detection — \d, \w, \s, Unicode properties
  • Quantifier finding — converts "aaa" "aa" to "a{2,3}"
  • Alternation optimization — shared prefix/suffix factoring
  • Case-insensitive mode-i for case folding
  • Unicode support — handles non-ASCII input natively
  • JSON output-j for programmatic consumption
  • Rust binary — fast, single-file, no dependencies

Comparison with Similar Tools

Feature grex regex101.com LLM prompt Debuggex regexr
Infers from examples Yes No Yes (imprecise) No No
Offline Yes No (web) No No (web) No (web)
Customizable Flags Visual editor Prompt engineering Visual editor Visual editor
Deterministic Yes N/A No N/A N/A
Best For Automated regex generation Learning regex Quick drafts Visual debugging Visual debugging

FAQ

Q: Will grex always produce the best regex? A: No — it produces a correct regex that matches your examples. For production use, review and tighten (e.g., replace .+ with a specific class) based on what else could match.

Q: Can it handle thousands of examples? A: Yes. It scales linearly with input size. For very large input, pre-deduplicate to speed up.

Q: Does grex generate PCRE or POSIX regex? A: Rust-regex / PCRE-compatible by default. Most modern tools (rg, PCRE, Python re, Go regexp) accept the output. For strict POSIX-only targets, test carefully.

Q: grex vs asking an LLM? A: LLMs can be wrong subtly; grex is deterministic and correct for the inputs. Use grex for automated pipelines; use LLMs when you also want explanations and edge-case coverage.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产