A grep
-like tool which understands source code syntax and allows for manipulation in
addition to search.
Like grep
, regular expressions are a core primitive. Unlike grep
, additional
capabilities allow for higher precision, with options for manipulation. This
allows srgn
to operate along dimensions regular expressions and IDE tooling (Rename
all, Find all references, ...) alone cannot, complementing them.
srgn
is organized around actions to take (if any), acting only within precise,
optionally language grammar-aware scopes. In terms of existing tools, think of it
as a mix of
tr
,
sed
,
ripgrep and
tree-sitter
, with a design goal of
simplicity: if you know regex and the basics of the language you are working with, you
are good to go.
Tip
All code snippets displayed here are verified as part of unit tests
using the actual srgn
binary. What is showcased here is guaranteed to work.
The most simple srgn
usage works similar to tr
:
$ echo 'Hello World!' | srgn '[wW]orld' 'there' # replacement
Hello there!
Matches for the regular expression pattern '[wW]orld'
(the scope) are replaced (the
action) by the second positional argument. Zero or more actions can be specified:
$ echo 'Hello World!' | srgn '[wW]orld' # zero actions: input returned unchanged
Hello World!
$ echo 'Hello World!' | srgn --upper '[wW]orld' 'you' # two actions: replacement, afterwards uppercasing
Hello YOU!
Replacement is always performed first and specified positionally. Any other actions are applied after and given as command line flags.
Similarly, more than one scope can be specified: in addition to the regex pattern, a
language grammar-aware scope can be
given, which scopes to syntactical elements of source code (think, for example, "all
bodies of class
definitions in Python"). If both are given, the regular expression
pattern is then only applied within that first, language scope. This enables
search and manipulation at precision not normally possible using plain regular
expressions, and serving a dimension different from tools such as Rename all in IDEs.
For example, consider this (pointless) Python source file:
"""Module for watching birds and their age."""
from dataclasses import dataclass
@dataclass
class Bird:
"""A bird!"""
name: str
age: int
def celebrate_birthday(self):
print("?")
self.age += 1
@classmethod
def from_egg(egg):
"""Create a bird from an egg."""
pass # No bird here yet!
def register_bird(bird: Bird, db: Db) -> None:
assert bird.age >= 0
with db.tx() as tx:
tx.insert(bird)
which can be searched using:
$ cat birds.py | srgn --python 'class' 'age'
11: age: int
15: self.age += 1
The string age
was sought and found only within Python class
definitions (and not,
for example, in function bodies such as register_bird
, where age
also occurs and
would be nigh impossible to exclude from consideration in vanilla grep
). By default,
this 'search mode' also prints line numbers. Search mode is entered if no actions are
specified, and a language such as --python
is given1—think of it like
'ripgrep but with syntactical language
elements'.
Searching can also be performed across
lines, for example to
find methods (aka def
within class
) lacking docstrings:
$ cat birds.py | srgn --python 'class' 'def .+:ns+[^"s]{3}' # do not try this pattern at home
13: def celebrate_birthday(self):
14: print("?")
Note how this does not surface either from_egg
(has a docstring) or register_bird
(not a method, def
outside class
).
Language scopes themselves can be specified multiple times as well. For example, in the Rust snippet
pub enum Genre {
Rock(Subgenre),
Jazz,
}
const MOST_POPULAR_SUBGENRE: Subgenre = Subgenre::Something;
pub struct Musician {
name: String,
genres: Vec<Subgenre>,
}
multiple items can be surgically drilled down into as
$ cat music.rs | srgn --rust 'pub-enum' --rust 'type-identifier' 'Subgenre' # AND'ed together
2: Rock(Subgenre),
where only lines matching all criteria are returned, acting like a logical and
between all conditions. Note that conditions are evaluated left-to-right, precluding
some combinations from making sense: for example, searching for a Python class
body
inside of Python doc-strings
usually returns nothing. The inverse works as expected
however:
$ cat birds.py | srgn --py 'class' --py 'doc-strings'
8: """A bird!"""
19: """Create a bird from an egg."""
No docstrings outside class
bodies are surfaced!
The -j
flag changes this behavior: from intersecting left-to-right, to
running all queries independently and joining their results, allowing you to search
multiple ways at once:
$ cat birds.py | srgn -j --python 'comments' --python 'doc-strings' 'bird[^s]'
8: """A bird!"""
19: """Create a bird from an egg."""
20: pass # No bird here yet!
The pattern bird[^s]
was found inside of comments or docstrings likewise, not just
"docstrings within comments".
If standard input is not given, srgn
knows how to find relevant source files
automatically, for example in this repository:
$ srgn --python 'class' 'age'
docs/samples/birds
11: age: int
15: self.age += 1
docs/samples/birds.py
9: age: int
13: self.age += 1
It recursively walks its current directory, finding files based on file
extensions and shebang lines, processing
at very high speed. For example, srgn --go strings 'd+'
finds and prints all ~140,000
runs of digits in literal Go strings inside the Kubernetes
codebase
of ~3,000,000 lines of Go code within 3 seconds on 12 cores of M3. For more on working
with many files, see below.
Scopes and actions can be combined almost arbitrarily (though many combinations are not going to be use- or even meaningful). For example, consider this Python snippet (for examples using other supported languages see below):
"""GNU module."""
def GNU_says_moo():
"""The GNU function -> say moo -> ✅"""
GNU = """
GNU
""" # the GNU...
print(GNU + " says moo") # ...says moo
against which the following command is run:
cat gnu.py | srgn --titlecase --python 'doc-strings' '(?' '$1: GNU ? is not Unix'
The anatomy of that invocation is:
--titlecase
(an action) will Titlecase Everything Found In
Scope
--python 'doc-strings'
(a scope) will scope to
(i.e., only take into consideration) docstrings according to the Python language
grammar
'(? (a scope) sees only what was already scoped by
the previous option, and will narrow it down further. It can never extend the previous
scope. The regular expression scope is applied after any language scope(s).
(? is negative
lookbehind syntax,
demonstrating how this advanced feature is available. Strings of
GNU
prefixed by
The
will not be considered.
'$1: GNU ? is not Unix'
(an action) will replace each matched
occurrence (i.e., each input section found to be in scope) with this string. Matched
occurrences are patterns of '(? only within Python docstrings.
Notably, this replacement string demonstrates:
$1
, which carries
the contents captured by the first capturing regex group. That's ([a-z]+)
, as
(? is not capturing.
The command makes use of multiple scopes (language and regex pattern) and multiple actions (replacement and titlecasing). The result then reads
"""Module: GNU ? Is Not Unix."""
def GNU_says_moo():
"""The GNU function -> say moo -> ✅"""
GNU = """
GNU
""" # the GNU...
print(GNU + " says moo") # ...says moo
where the changes are limited to:
- """GNU module."""
+ """Module: GNU ? Is Not Unix."""
def GNU_says_moo():
"""The GNU -> say moo -> ✅"""
Warning
While srgn
is in beta (major version 0), make sure to only
(recursively) process files you can safely
restore.
Search mode does not overwrite files, so is always safe.
See below for the full help output of the tool.
Note
Supported languages are
Download a prebuilt binary from the releases.
This crate provides its binaries in a format
compatible
with cargo-binstall
:
cargo install cargo-binstall
(might take a while)cargo binstall srgn
(couple seconds, as it downloads prebuilt
binaries from GitHub)These steps are guaranteed to work™, as they are tested in CI. They also work if no prebuilt binaries are available for your platform, as the tool will fall back to compiling from source.
A formula is available via:
brew install srgn
Available via unstable:
nix-shell -p srgn
Available via the AUR.
A port is available:
sudo port install srgn
All GitHub Actions runner
images come with cargo
preinstalled, and cargo-binstall
provides a convenient GitHub
Action:
jobs:
srgn:
name: Install srgn in CI
# All three major OSes work
runs-on: ubuntu-latest
steps:
- uses: cargo-bins/cargo-binstall@main
- name: Install binary
run: >
cargo binstall
--no-confirm
srgn
- name: Use binary
run: srgn --version
The above concludes in just 5 seconds
total, as no
compilation is required. For more context, see cargo-binstall
's advise on
CI.
On Linux, gcc
works.
On macOS, use clang
.
On Windows, MSVC works.
Select "Desktop development with C++" on installation.
cargo install srgn
cargo add srgn
See here for more.
Various
shells
are supported for shell completion scripts. For example, append eval "$(srgn --completions zsh)"
to ~/.zshrc
for completions in ZSH. An interactive session can
then look like:
The tool is designed around scopes and actions. Scopes narrow down the parts of the input to process. Actions then perform the processing. Generally, both scopes and actions are composable, so more than one of each may be passed. Both are optional (but taking no action is pointless); specifying no scope implies the entire input is in scope.
At the same time, there is considerable overlap with plain
tr
: the tool is designed to have close correspondence in the most common use
cases, and only go beyond when needed.
The simplest action is replacement. It is specially accessed (as an argument, not an
option) for compatibility with tr
, and general ergonomics. All other actions are
given as flags, or options should they take a value.
For example, simple, single-character replacements work as in tr
:
$ echo 'Hello, World!' | srgn 'H' 'J'
Jello, World!
The first argument is the scope (literal H
in this case). Anything matched by it is
subject to processing (replacement by J
, the second argument, in this case). However,
there is no direct concept of character classes as in tr
. Instead, by
default, the scope is a regular expression pattern, so its
classes can be used to
similar effect:
$ echo 'Hello, World!' | srgn '[a-z]' '_'
H____, W____!
The replacement occurs greedily across the entire match by default (note the UTS
character class,
reminiscent of tr
's
[:alnum:]
):
$ echo 'ghp_oHn0As3cr3T!!' | srgn 'ghp_[[:alnum:]]+' '*' # A GitHub token
*!!
Advanced regex features are supported, for example lookarounds:
$ echo 'ghp_oHn0As3cr3T' | srgn '(?<=ghp_)[[:alnum:]]+' '*'
ghp_*
Take care in using these safely, as advanced patterns come without certain safety and performance guarantees. If they aren't used, performance is not impacted.
The replacement is not limited to a single character. It can be any string, for example to fix this quote:
$ echo '"Using regex, I now have no issues."' | srgn 'no issues' '2 problems'
"Using regex, I now have 2 problems."
The tool is fully Unicode-aware, with useful support for certain advanced character classes:
$ echo 'Mood: ?' | srgn '?' '?'
Mood: ?
$ echo 'Mood: ???? :(' | srgn 'p{Emoji_Presentation}' '?'
Mood: ???? :(
Replacements are aware of variables, which are made accessible for use through regex capture groups. Capture groups can be numbered, or optionally named. The zeroth capture group corresponds to the entire match.
$ echo 'Swap It' | srgn '(w+) (w+)' '$2 $1' # Regular, numbered
It Swap
$ echo 'Swap It' | srgn '(w+) (w+)' '$2 $1$1$1' # Use as many times as you'd like
It SwapSwapSwap
$ echo 'Call +1-206-555-0100!' | srgn 'Call (+?d-d{3}-d{3}-d{4}).+' 'The phone number in "$0" is: $1.' # Variable `0` is the entire match
The phone number in "Call +1-206-555-0100!" is: +1-206-555-0100.
A more advanced use case is, for example, code refactoring using named capture groups (perhaps you can come up with a more useful one...):
$ echo 'let x = 3;' | srgn 'let (?[a-z]+) = (?.+);' 'const $var$var = $expr + $expr;'
const xx = 3 + 3;
As in bash, use curly braces to disambiguate variables from immediately adjacent content:
$ echo '12' | srgn '(d)(d)' '$2${1}1'
211
$ echo '12' | srgn '(d)(d)' '$2$11' # will fail (`11` is unknown)
$ echo '12' | srgn '(d)(d)' '$2${11' # will fail (brace was not closed)
Seeing how the replacement is merely a static string, its usefulness is limited. This is
where tr
's secret sauce
ordinarily comes into play: using its character classes, which are valid in the second
position as well, neatly translating from members of the first to the second. Here,
those classes are instead regexes, and only valid in first position (the scope). A
regular expression being a state machine, it is impossible to match onto a 'list of
characters', which in tr
is the second (optional) argument. That concept is out the
window, and its flexibility lost.
Instead, the offered actions, all of them fixed, are used. A peek at the most
common use cases for tr
reveals that the provided set of
actions covers virtually all of them! Feel free to file an issue if your use case is not
covered.
Onto the next action.
Removes whatever is found from the input. Same flag name as in tr
.
$ echo 'Hello, World!' | srgn -d '(H|W|!)'
ello, orld
Note
As the default scope is to match the entire input, it is an error to specify deletion without a scope.
Squeezes repeats of characters matching the scope into single occurrences. Same flag
name as in tr
.
$ echo 'Helloooo Woooorld!!!' | srgn -s '(o|!)'
Hello World!
If a character class is passed, all members of that class are squeezed into whatever class member was encountered first:
$ echo 'The number is: 3490834' | srgn -s 'd'
The number is: 3
Greediness in matching is not modified, so take care:
$ echo 'Winter is coming... ???' | srgn -s '?+'
Winter is coming... ???
Note
The pattern matched the entire run of suns, so there's nothing to squeeze. Summer prevails.
Invert greediness if the use case calls for it:
$ echo 'Winter is coming... ???' | srgn -s '?+?' '☃️'
Winter is coming... ☃️
Note
Again, as with deletion, specifying squeezing without an explicit scope is an error. Otherwise, the entire input is squeezed.
A good chunk of tr
usage falls into this category. It's
very straightforward.
$ echo 'Hello, World!' | srgn --lower
hello, world!
$ echo 'Hello, World!' | srgn --upper
HELLO, WORLD!
$ echo 'hello, world!' | srgn --titlecase
Hello, World!
Decomposes input according to Normalization Form D, and then discards code points of the Mark category (see examples). That roughly means: take fancy character, rip off dangly bits, throw those away.
$ echo 'Naïve jalapeño ärgert mgła' | srgn -d 'P{ASCII}' # Naive approach
Nave jalapeo rgert mga
$ echo 'Naïve jalapeño ärgert mgła' | srgn --normalize # Normalize is smarter
Naive jalapeno argert mgła
Notice how mgła
is out of scope for NFD, as it is "atomic" and thus not decomposable
(at least that's what ChatGPT whispers in my ear).
This action replaces multi-character, ASCII symbols with appropriate single-code point, native Unicode counterparts.
$ echo '(A --> B) != C --- obviously' | srgn --symbols
(A ⟶ B) ≠ C — obviously
Alternatively, if you're only interested in math, make use of scoping:
$ echo 'A <= B --- More is--obviously--possible' | srgn --symbols '<='
A ≤ B --- More is--obviously--possible
As there is a 1:1 correspondence between an ASCII symbol and its replacement, the effect is reversible2:
$ echo 'A ⇒ B' | srgn --symbols --invert
A => B
There is only a limited set of symbols supported as of right now, but more can be added.
This action replaces alternative spellings of German special characters (ae, oe, ue, ss) with their native versions (ä, ö, ü, ß)3.
$ echo 'Gruess Gott, Neueroeffnungen, Poeten und Abenteuergruetze!' | srgn --german
Grüß Gott, Neueröffnungen, Poeten und Abenteuergrütze!
This action is based on a word list (compile without
german
feature if this bloats your binary too much). Note the following features about
the above example:
Poeten
remained as-is, instead of being naively and mistakenly converted to Pöten
Abenteuergrütze
is not going to be found in any reasonable
word list, but was
handled properly nonethelessAbenteuer
remained as-is as well, instead of being
incorrectly converted to Abenteür
Neueroeffnungen
sneakily forms a ue
element neither constituent word
(neu
, Eröffnungen
) possesses, but is still processed correctly (despite the
mismatched casings as well)On request, replacements may be forced, as is potentially useful for names:
$ echo 'Frau Loetter steht ueber der Mauer.' | srgn --german-naive '(?<=Frau )w+'
Frau Lötter steht ueber der Mauer.
Through positive lookahead, nothing but the salutation was scoped and therefore changed.
Mauer
correctly remained as-is, but ueber
was not processed. A second pass fixes
this:
$ echo 'Frau Loetter steht ueber der Mauer.' | srgn --german-naive '(?<=Frau )w+' | srgn --german
Frau Lötter steht über der Mauer.
Note
Options and flags pertaining to some "parent" are prefixed with their parent's name,
and will imply their parent when given, such that the latter does not need to be
passed explicitly. That's why --german-naive
is named as it is, and --german
needn't be passed.
This behavior might change once clap
supports subcommand
chaining.
Some branches are undecidable for this modest tool, as it operates without language
context. For example, both Busse
(busses) and Buße
(penance) are legal words. By
default, replacements are greedily performed if legal (that's the whole
point of srgn
,
after all), but there's a flag for toggling this behavior:
$ echo 'Busse und Geluebte ' | srgn --german
Buße und Gelübte
$ echo 'Busse ? und Fussgaenger ?♀️' | srgn --german-prefer-original
Busse ? und Fußgänger ?♀️
Most actions are composable, unless doing so were nonsensical (like for deletion). Their order of application is fixed, so the order of the flags given has no influence (piping multiple runs is an alternative, if needed). Replacements always occur first. Generally, the CLI is designed to prevent misuse and surprises: it prefers crashing to doing something unexpected (which is subjective, of course). Note that lots of combinations are technically possible, but might yield nonsensical results.
Combining actions might look like:
$ echo 'Koeffizienten != Bruecken...' | srgn -Sgu
KOEFFIZIENTEN ≠ BRÜCKEN...
A more narrow scope can be specified, and will apply to all actions equally:
$ echo 'Koeffizienten != Bruecken...' | srgn -Sgu 'bw{1,8}b'
Koeffizienten != BRÜCKEN...
The word boundaries are
required as otherwise Koeffizienten
is matched as Koeffizi
and enten
. Note how the
trailing periods cannot be, for example, squeezed. The required scope of .
would
interfere with the given one. Regular piping solves this:
$ echo 'Koeffizienten != Bruecken...' | srgn -Sgu 'bw{1,8}b' | srgn -s '.'
Koeffizienten != BRÜCKEN.
Note: regex escaping (.
) can be circumvent using literal scoping.
The specially treated replacement action is also composable:
$ echo 'Mooood: ????!!!' | srgn -s 'p{Emoji}' '?'
Mooood: ?!!!
Emojis are first all replaced, then squeezed. Notice how nothing else is squeezed.
Scopes are the second driving concept to srgn
. In the default case, the main scope is
a regular expression. The actions section showcased this use case in some
detail, so it's not repeated here. It is given as a first positional argument.
srgn
extends this through prepared, language grammar-aware scopes, made possible
through the excellent tree-sitter
library. It offers a
queries feature,
which works much like pattern matching against a tree data
structure.
srgn
comes bundled with a handful of the most useful of these queries. Through its
discoverable API (either as a library or via CLI, srgn --help
), one
can learn of the supported languages and available, prepared queries. Each supported
language comes with an escape hatch, allowing you to run your own, custom ad-hoc
queries. The hatch comes in the form of --lang-query
, where lang
is a
language such as python
. See below for more on this advanced topic.
Note
Language scopes are applied first, so whatever regex aka main scope you pass, it operates on each matched language construct individually.
This section shows examples for some of the prepared queries.
unsafe
code (Rust)One advantage of the unsafe
keyword in
Rust is its "grepability".
However, an rg 'unsafe'
will of course surface all string matches (rg 'bunsafeb'
helps to an extent), not just those in of the actual Rust language keyword. srgn
helps
make this more precise. For example:
// Oh no, an unsafe module!
mod scary_unsafe_operations {
pub unsafe fn unsafe_array_access(arr: &[i32], index: usize) -> i32 {
// UNSAFE: This function performs unsafe array access without bounds checking
*arr.get_unchecked(index)
}
pub fn call_unsafe_function() {
let unsafe_numbers = vec![1, 2, 3, 4, 5];
println!("About to perform an unsafe operation!");
let result = unsafe {
// Calling an unsafe function
unsafe_array_access(&unsafe_numbers, 10)
};
println!("Result of unsafe operation: {}", result);
}
}
can be searched as