grex下载 - grex源代码下载

1.这个工具有什么作用？

grex是一个库，也是一个命令行实用程序，旨在简化创建正则表达式的复杂而乏味的任务。它通过从用户提供的测试用例自动生成单个正则表达式来实现这一点。保证生成的表达式与生成它的测试用例相匹配。

该项目最初是由 Devon Govett 编写的 JavaScript 工具regexgen的 Rust 端口。尽管可以添加许多进一步有用的功能，但它的开发显然在几年前就停止了。现在计划将这些新功能添加到grex中，因为 Rust 在命令行工具方面确实很出色。 grex提供regexgen提供的所有功能，甚至更多。

该项目的理念是默认生成最具体的正则表达式，它仅与给定的输入完全匹配，而不与其他任何内容匹配。通过使用命令行标志（在 CLI 工具中）或预处理方法（在库中），可以创建更通用的表达式。

生成的表达式是与 Perl 兼容的正则表达式，也与 Rust正则表达式箱中的正则表达式解析器兼容。到目前为止，其他正则表达式解析器或其他编程语言的相应库尚未经过测试，但它们也应该大部分兼容。

2. 那我还需要学习写正则表达式吗？

绝对是的！使用标准设置， grex生成一个正则表达式，保证仅匹配作为输入给出的测试用例，而不匹配其他任何内容。这已通过性能测试得到验证。但是，如果启用了到简写字符类（例如w的转换，则生成的正则表达式会匹配更广泛的测试用例范围。了解此转换的后果对于为您的业务领域找到正确的正则表达式至关重要。

grex使用一种算法，尝试为给定的测试用例找到最短的可能正则表达式。但很多时候，生成的表达式仍然比它需要的更长或更复杂。在这种情况下，只能手动创建更紧凑或更优雅的正则表达式。此外，每个正则表达式引擎都有不同的内置优化。 grex对这些一无所知，因此无法针对特定引擎优化其正则表达式。

所以，请学习如何编写正则表达式！目前grex的最佳用例是找到一个初始正确的正则表达式，如果可以进一步优化，则应手动检查该正则表达式。

3. 目前的特点

文字
字符类
检测常见前缀和后缀
检测重复子串并转换为{min,max}量词符号
交替使用|操作员
可选性使用?量词
转义非 ASCII 字符，可选择将星形代码点转换为代理项对
区分大小写或不区分大小写的匹配
捕获或非捕获组
可选锚点^和$
完全符合 Unicode 标准 15.0
与regex crate 1.9.0+ 完全兼容
正确处理由多个 Unicode 符号组成的字素
从命令行或文件读取输入字符串
使用可选的详细模式生成多个缩进的更具可读性的表达式
可选的语法突出显示可在支持的终端中提供更好的输出

4.如何安装？

4.1 命令行工具

您可以下载适合您上面平台的独立可执行文件，并将其放在您选择的位置。或者，预编译的 64 位二进制文件可在包管理器 Scoop（适用于 Windows）、Homebrew（适用于 macOS 和 Linux）、MacPorts（适用于 macOS）和 Huber（适用于 macOS、Linux 和 Windows）中使用。 Raúl Piracés 贡献了一个 Chocolatey Windows 软件包。

grex还托管在官方 Rust 包注册表 crates.io 上。如果您是 Rust 开发人员并且已经安装了 Rust 工具链，则可以使用 Rust 包管理器Cargo从源代码编译来安装。因此，您的安装选项总结如下：

 ( brew | cargo | choco | huber | port | scoop ) install grex

4.2 图书馆

为了使用grex作为库，只需将其作为依赖项添加到您的Cargo.toml文件中：

[ dependencies ]
grex = { version = " 1.4.5 " , default-features = false }

仅命令行工具需要依赖clap 。通过禁用默认功能，可以防止库下载和编译 clap。

5.如何使用？

库部分提供了可用设置的详细说明。所有设置都可以自由组合。

5.1 命令行工具

测试用例可以直接传递（ grex abc ），也可以从文件传递（ grex -f test_cases.txt ）。 grex也能够从 Unix 管道接收输入，例如cat test_cases.txt | grex - .

下表显示了所有可用的标志和选项：

 $ grex -h

grex 1.4.5
© 2019-today Peter M. Stahl <[email protected]>
Licensed under the Apache License, Version 2.0
Downloadable from https://crates.io/crates/grex
Source code at https://github.com/pemistahl/grex

grex generates regular expressions from user-provided test cases.

Usage: grex [OPTIONS] {INPUT...|--file <FILE>}

Input:
  [INPUT]...         One or more test cases separated by blank space
  -f, --file <FILE>  Reads test cases on separate lines from a file

Digit Options:
  -d, --digits      Converts any Unicode decimal digit to d
  -D, --non-digits  Converts any character which is not a Unicode decimal digit to D

Whitespace Options:
  -s, --spaces      Converts any Unicode whitespace character to s
  -S, --non-spaces  Converts any character which is not a Unicode whitespace character to S

Word Options:
  -w, --words      Converts any Unicode word character to w
  -W, --non-words  Converts any character which is not a Unicode word character to W

Escaping Options:
  -e, --escape           Replaces all non-ASCII characters with unicode escape sequences
      --with-surrogates  Converts astral code points to surrogate pairs if --escape is set

Repetition Options:
  -r, --repetitions
          Detects repeated non-overlapping substrings and converts them to {min,max} quantifier
          notation
      --min-repetitions <QUANTITY>
          Specifies the minimum quantity of substring repetitions to be converted if --repetitions
          is set [default: 1]
      --min-substring-length <LENGTH>
          Specifies the minimum length a repeated substring must have in order to be converted if
          --repetitions is set [default: 1]

Anchor Options:
      --no-start-anchor  Removes the caret anchor `^` from the resulting regular expression
      --no-end-anchor    Removes the dollar sign anchor `$` from the resulting regular expression
      --no-anchors       Removes the caret and dollar sign anchors from the resulting regular
                         expression

Display Options:
  -x, --verbose   Produces a nicer-looking regular expression in verbose mode
  -c, --colorize  Provides syntax highlighting for the resulting regular expression

Miscellaneous Options:
  -i, --ignore-case     Performs case-insensitive matching, letters match both upper and lower case
  -g, --capture-groups  Replaces non-capturing groups with capturing ones
  -h, --help            Prints help information
  -v, --version         Prints version information

5.2 图书馆

5.2.1 默认设置

测试用例通过RegExpBuilder::from()从集合传递，或者通过RegExpBuilder::from_file()从文件传递。如果从文件中读取，每个测试用例必须位于单独的行上。行可以以换行符n或回车符和换行符rn结束。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "a" , "aa" , "aaa" ] ) . build ( ) ;
assert_eq ! ( regexp , "^a(?:aa?)?$" ) ;

5.2.2 转换为字符类

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "a" , "aa" , "123" ] )
    . with_conversion_of_digits ( )
    . with_conversion_of_words ( )
    . build ( ) ;
assert_eq ! ( regexp , "^( \ d \ d \ d| \ w(?: \ w)?)$" ) ;

5.2.3 转换重复子串

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "aa" , "bcbc" , "defdefdef" ] )
    . with_conversion_of_repetitions ( )
    . build ( ) ;
assert_eq ! ( regexp , "^(?:a{2}|(?:bc){2}|(?:def){3})$" ) ;

默认情况下， grex以这种方式转换每个子字符串，该子字符串至少是一个字符长，并且随后至少重复一次。如果您愿意，您可以自定义这两个参数。

在下面的示例中，测试用例aa不会转换为a{2} ，因为重复子串a长度为 1，但最小子串长度已设置为 2。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "aa" , "bcbc" , "defdefdef" ] )
    . with_conversion_of_repetitions ( )
    . with_minimum_substring_length ( 2 )
    . build ( ) ;
assert_eq ! ( regexp , "^(?:aa|(?:bc){2}|(?:def){3})$" ) ;

在下一个示例中设置最小重复次数为 2 次，只有测试用例defdefdef才会被转换，因为它是唯一重复两次的测试用例。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "aa" , "bcbc" , "defdefdef" ] )
    . with_conversion_of_repetitions ( )
    . with_minimum_repetitions ( 2 )
    . build ( ) ;
assert_eq ! ( regexp , "^(?:bcbc|aa|(?:def){3})$" ) ;

5.2.4 转义非ascii字符

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "You smell like ?." ] )
    . with_escaping_of_non_ascii_chars ( false )
    . build ( ) ;
assert_eq ! ( regexp , "^You smell like \ u{1f4a9} \ .$" ) ;

旧版本的 JavaScript 不支持星体代码平面的 unicode 转义序列（范围U+010000到U+10FFFF ）。为了在 JavaScript 正则表达式中支持这些符号，需要转换为代理对。有关此事的更多信息可以在这里找到。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "You smell like ?." ] )
    . with_escaped_non_ascii_chars ( true )
    . build ( ) ;
assert_eq ! ( regexp , "^You smell like \ u{d83d} \ u{dca9} \ .$" ) ;

5.2.5 不区分大小写的匹配

grex生成的正则表达式默认区分大小写。可以像这样启用不区分大小写的匹配：

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "big" , "BIGGER" ] )
    . with_case_insensitive_matching ( )
    . build ( ) ;
assert_eq ! ( regexp , "(?i)^big(?:ger)?$" ) ;

5.2.6 捕获组

默认情况下使用非捕获组。扩展前面的示例，您可以切换到捕获组。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "big" , "BIGGER" ] )
    . with_case_insensitive_matching ( )
    . with_capturing_groups ( )
    . build ( ) ;
assert_eq ! ( regexp , "(?i)^big(ger)?$" ) ;

5.2.7 详细模式

如果您发现生成的正则表达式难以阅读，可以启用详细模式。然后将表情分成多行并缩进，使其更悦目。

 use grex :: RegExpBuilder ;
use indoc :: indoc ;

let regexp = RegExpBuilder :: from ( & [ "a" , "b" , "bcd" ] )
    . with_verbose_mode ( )
    . build ( ) ;

assert_eq ! ( regexp , indoc! (
    r#"
    (?x)
    ^
      (?:
        b
        (?:
          cd
        )?
        |
        a
      )
    $"#
) ) ;

5.2.8 禁用锚点

默认情况下，锚点^和$放置在每个生成的正则表达式周围，以确保它仅匹配作为输入给出的测试用例。然而，通常希望将生成的模式用作更大模式的一部分。为此，可以单独或同时禁用锚点。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "a" , "aa" , "aaa" ] )
    . without_anchors ( )
    . build ( ) ;
assert_eq ! ( regexp , "a(?:aa?)?" ) ;

5.3 示例

以下示例显示了各种受支持的正则表达式语法功能：

$ grex a b c
^[a-c]$

$ grex a c d e f
^[ac-f]$

$ grex a b x de
^( ? :de | [abx])$

$ grex abc bc
^a ? bc$

$ grex a b bc
^( ? :bc ? | a)$

$ grex [a-z]
^ [ a - z ] $

$ grex -r b ba baa baaa
^b( ? :a{1,3}) ? $

$ grex -r b ba baa baaaa
^b( ? :a{1,2} | a{4}) ? $

$ grex y̆ a z
^( ? :y̆ | [az])$
Note: 
Grapheme y̆ consists of two Unicode symbols:
U+0079 (Latin Small Letter Y)
U+0306 (Combining Breve)

$ grex " I ♥ cake " " I ♥ cookies "
^I ♥ c( ? :ookies | ake)$
Note:
Input containing blank space must be 
surrounded by quotation marks.

字符串"I ♥♥♥ 36 and ٣ and ??."使用命令行符号作为以下示例的输入：

$ grex < INPUT >
^I ♥♥♥ 36 and ٣ and ?? . $

$ grex -e < INPUT >
^I u {2665} u {2665} u {2665} 36 and u {663} and u {1f4a9} u {1f4a9} . $

$ grex -e --with-surrogates < INPUT >
^I u {2665} u {2665} u {2665} 36 and u {663} and u {d83d} u {dca9} u {d83d} u {dca9} . $

$ grex -d < INPUT >
^I ♥♥♥ dd and d and ?? . $

$ grex -s < INPUT >
^I s ♥♥♥ s 36 s and s ٣ s and s ?? . $

$ grex -w < INPUT >
^ w ♥♥♥ ww www w www ?? . $

$ grex -D < INPUT >
^ DDDDDD 36 DDDDD ٣ DDDDDDDD $

$ grex -S < INPUT >
^ S SSS SS SSS S SSS SSS $

$ grex -dsw < INPUT >
^ ws ♥♥♥ sddswwwsdswwws ?? . $

$ grex -dswW < INPUT >
^ wsWWWsddswwwsdswwwsWWW $

$ grex -r < INPUT >
^I ♥{3} 36 and ٣ and ?{2} . $

$ grex -er < INPUT >
^I u {2665}{3} 36 and u {663} and u {1f4a9}{2} . $

$ grex -er --with-surrogates < INPUT >
^I u {2665}{3} 36 and u {663} and ( ? : u {d83d} u {dca9}){2} . $

$ grex -dgr < INPUT >
^I ♥{3} d ( d and ){2}?{2} . $

$ grex -rs < INPUT >
^I s ♥{3} s 36 s and s ٣ s and s ?{2} . $

$ grex -rw < INPUT >
^ w ♥{3} w ( ? : w w {3} ){2}?{2} . $

$ grex -Dr < INPUT >
^ D {6}36 D {5}٣ D {8}$

$ grex -rS < INPUT >
^ S S ( ? : S {2} ){2} S {3} S S {3} S {3}$

$ grex -rW < INPUT >
^I W {5}36 W and W ٣ W and W {4}$

$ grex -drsw < INPUT >
^ ws ♥{3} sd ( ? : dsw {3} s ){2}?{2} . $

$ grex -drswW < INPUT >
^ wsW {3} sd ( ? : dsw {3} s ){2} W {3}$

6、如何搭建？

为了自己构建源代码，您需要在计算机上安装稳定的 Rust 工具链，以便 Rust 包管理器Cargo可用。请注意：构建 CLI 需要 Rust >= 1.70.0。对于库部分，Rust < 1.70.0 就足够了。

git clone https://github.com/pemistahl/grex.git
cd grex
cargo build

源代码附带一个广泛的测试套件，包括单元测试、集成测试和属性测试。要运行它们，只需说：

cargo test

可以通过以下方式运行测量多种设置性能的基准：

cargo bench

7.Python扩展模块

在 PyO3 和 Maturin 的帮助下，该库已被编译为 Python 扩展模块，以便它也可以在任何 Python 软件中使用。它可以在 Python 包索引中找到，并且可以通过以下方式安装：

pip install grex

要自己构建 Python 扩展模块，请创建一个虚拟环境并安装 Maturin。

python -m venv /path/to/virtual/environment
source /path/to/virtual/environment/bin/activate
pip install maturin
maturin build

Python 库包含一个名为RegExpBuilder的类，可以像这样导入：

 from grex import RegExpBuilder

8.WebAssembly 支持

该库可以编译为 WebAssembly (WASM)，它允许在任何基于 JavaScript 的项目中使用grex ，无论是在浏览器中还是在 Node.js 上运行的后端中。

最简单的编译方法是使用wasm-pack 。安装后，您可以使用 Web 目标构建库，以便可以直接在浏览器中使用：

 wasm-pack build --target web

这会在此存储库的顶层创建一个名为pkg的目录，其中包含已编译的 wasm 文件以及 JavaScript 和 TypeScript 绑定。在 HTML 文件中，您可以像下面这样调用grex ，例如：

 < script type =" module " >
    import init , { RegExpBuilder } from "./pkg/grex.js" ;

    init ( ) . then ( _ => {
        alert ( RegExpBuilder . from ( [ "hello" , "world" ] ) . build ( ) ) ;
    } ) ;
</ script >

还有一些适用于 Node.js 以及 Chrome、Firefox 和 Safari 浏览器的集成测试。要运行它们，只需说：

 wasm-pack test --node --headless --chrome --firefox --safari

如果测试无法在 Safari 中启动，您需要首先运行以下命令来启用 Safari 的 Web 驱动程序：

 sudo safaridriver --enable

wasm-pack的输出将托管在一个单独的存储库中，该存储库允许添加更多与 JavaScript 相关的配置、测试和文档。然后grex也将被添加到 npm 注册表中，以便在每个 JavaScript 或 TypeScript 项目中轻松下载和安装。

有一个演示网站，您可以尝试一下 grex。

9. 它是如何运作的？

根据输入字符串创建确定性有限自动机 (DFA)。
通过应用 Hopcroft 的 DFA 最小化算法，减少了 DFA 中的状态数量和状态之间的转换。
最小化 DFA 表示为线性方程组，用 Brzozowski 代数方法求解，得到最终的正则表达式。

10. 1.5.0 版本的下一步是什么？

看看计划中的问题。

11. 贡献

如果您想为grex做出一些贡献，我鼓励您这样做。您对酷炫功能有什么想法吗？或者到目前为止您发现任何错误吗？请随意提出问题或发送拉取请求。非常感谢。 :-)

展开

grex

1.这个工具有什么作用？

2. 那我还需要学习写正则表达式吗？

3. 目前的特点

4.如何安装？

4.1 命令行工具

4.2 图书馆

5.如何使用？

5.1 命令行工具

5.2 图书馆

5.2.1 默认设置

5.2.2 转换为字符类

5.2.3 转换重复子串

5.2.4 转义非ascii字符

5.2.5 不区分大小写的匹配

5.2.6 捕获组

5.2.7 详细模式

5.2.8 禁用锚点

5.3 示例

6、如何搭建？

7.Python扩展模块

8.WebAssembly 支持

9. 它是如何运作的？

10. 1.5.0 版本的下一步是什么？

11. 贡献

Google Blog Converters(博客数据转换器)

Nuitka

smartchart数据可视化平台 v6.9

azure storage python

datamule python

Redash开源的数据图表工具 v24.10.0

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Blog Converters(博客数据转换器)

Nuitka

smartchart数据可视化平台 v6.9

waymo open dataset

termwind

wp functions