grex下載 - grex原始碼下載

1.這個工具有什麼作用？

grex是一個函式庫，也是一個命令列實用程序，旨在簡化建立正規表示式的複雜而乏味的任務。它透過從使用者提供的測試案例自動產生單一正規表示式來實現這一點。保證產生的表達式與產生它的測試案例相符。

該專案最初是由 Devon Govett 編寫的 JavaScript 工具regexgen的 Rust 連接埠。儘管可以添加許多進一步有用的功能，但它的開發顯然在幾年前就停止了。現在計劃將這些新功能添加到grex中，因為 Rust 在命令列工具方面確實很出色。 grex提供regexgen提供的所有功能，甚至更多。

該專案的理念是預設產生最具體的正規表示式，它僅與給定的輸入完全匹配，而不與其他任何內容匹配。透過使用命令列標誌（在 CLI 工具中）或預處理方法（在庫中），可以建立更通用的表達式。

產生的表達式是與 Perl 相容的正規表示式，也與 Rust正規表示式箱中的正規表示式解析器相容。到目前為止，其他正規表示式解析器或其他程式語言的對應程式庫尚未經過測試，但它們也應該大部分相容。

2. 那我還需要學習寫正規表示式嗎？

絕對是的！使用標準設置， grex產生一個正規表達式，保證僅匹配作為輸入給出的測試案例，而不匹配其他任何內容。這已通過性能測試得到驗證。但是，如果啟用了到簡寫字元類別（例如w的轉換，則產生的正規表示式會符合更廣泛的測試案例範圍。了解此轉換的後果對於為您的業務領域找到正確的正規表示式至關重要。

grex使用一種演算法，嘗試為給定的測試案例找到最短的可能正規表示式。但很多時候，產生的表達式仍然比它需要的更長或更複雜。在這種情況下，只能手動建立更緊湊或更優雅的正規表示式。此外，每個正規表示式引擎都有不同的內建最佳化。 grex對這些一無所知，因此無法針對特定引擎優化其正規表示式。

所以，請學習如何寫正規表示式！目前grex的最佳用例是找到初始正確的正規表示式，如果可以進一步最佳化，則應手動檢查該正規表示式。

3. 目前的特點

文字
字元類
檢測常見前綴和後綴
偵測重複子字串並轉換為{min,max}量詞符號
交替使用|操作員
可選性使用?量詞
轉義非 ASCII 字符，可選擇將星形代碼點轉換為代理項對
區分大小寫或不區分大小寫的匹配
捕獲或非捕獲組
可選錨點^和$
完全符合 Unicode 標準 15.0
與regex crate 1.9.0+ 完全相容
正確處理由多個 Unicode 符號組成的字素
從命令列或檔案讀取輸入字串
使用可選的詳細模式產生多個縮排的更具可讀性的表達式
可選的語法突出顯示可在支援的終端中提供更好的輸出

4.如何安裝？

4.1 命令列工具

您可以下載適合您上面平台的獨立可執行文件，並將其放在您選擇的位置。或者，預先編譯的64 位元二進位檔案可在套件管理器Scoop（適用於Windows）、Homebrew（適用於macOS 和Linux）、MacPorts（適用於macOS）和Huber（適用於macOS、Linux 和Windows）中使用。 Raúl Piracés 貢獻了一個 Chocolatey Windows 軟體包。

grex也託管在官方 Rust 套件註冊表 crates.io 上。如果您是 Rust 開發人員並且已經安裝了 Rust 工具鏈，則可以使用 Rust 套件管理器Cargo從原始程式碼編譯來安裝。因此，您的安裝選項總結如下：

 ( brew | cargo | choco | huber | port | scoop ) install grex

4.2 圖書館

為了使用grex作為庫，只需將其作為依賴項添加到您的Cargo.toml檔案中：

[ dependencies ]
grex = { version = " 1.4.5 " , default-features = false }

僅命令列工具需要依賴clap 。透過停用預設功能，可以防止庫下載和編譯 clap。

5.如何使用？

庫部分提供了可用設定的詳細說明。所有設定都可以自由組合。

5.1 命令列工具

測試案例可以直接傳遞（ grex abc ），也可以從檔案傳遞（ grex -f test_cases.txt ）。 grex也能夠從 Unix 管道接收輸入，例如cat test_cases.txt | grex - .

下表顯示了所有可用的標誌和選項：

 $ grex -h

grex 1.4.5
© 2019-today Peter M. Stahl <[email protected]>
Licensed under the Apache License, Version 2.0
Downloadable from https://crates.io/crates/grex
Source code at https://github.com/pemistahl/grex

grex generates regular expressions from user-provided test cases.

Usage: grex [OPTIONS] {INPUT...|--file <FILE>}

Input:
  [INPUT]...         One or more test cases separated by blank space
  -f, --file <FILE>  Reads test cases on separate lines from a file

Digit Options:
  -d, --digits      Converts any Unicode decimal digit to d
  -D, --non-digits  Converts any character which is not a Unicode decimal digit to D

Whitespace Options:
  -s, --spaces      Converts any Unicode whitespace character to s
  -S, --non-spaces  Converts any character which is not a Unicode whitespace character to S

Word Options:
  -w, --words      Converts any Unicode word character to w
  -W, --non-words  Converts any character which is not a Unicode word character to W

Escaping Options:
  -e, --escape           Replaces all non-ASCII characters with unicode escape sequences
      --with-surrogates  Converts astral code points to surrogate pairs if --escape is set

Repetition Options:
  -r, --repetitions
          Detects repeated non-overlapping substrings and converts them to {min,max} quantifier
          notation
      --min-repetitions <QUANTITY>
          Specifies the minimum quantity of substring repetitions to be converted if --repetitions
          is set [default: 1]
      --min-substring-length <LENGTH>
          Specifies the minimum length a repeated substring must have in order to be converted if
          --repetitions is set [default: 1]

Anchor Options:
      --no-start-anchor  Removes the caret anchor `^` from the resulting regular expression
      --no-end-anchor    Removes the dollar sign anchor `$` from the resulting regular expression
      --no-anchors       Removes the caret and dollar sign anchors from the resulting regular
                         expression

Display Options:
  -x, --verbose   Produces a nicer-looking regular expression in verbose mode
  -c, --colorize  Provides syntax highlighting for the resulting regular expression

Miscellaneous Options:
  -i, --ignore-case     Performs case-insensitive matching, letters match both upper and lower case
  -g, --capture-groups  Replaces non-capturing groups with capturing ones
  -h, --help            Prints help information
  -v, --version         Prints version information

5.2 圖書館

5.2.1 預設設定

測試用例透過RegExpBuilder::from()從集合傳遞，或透過RegExpBuilder::from_file()從檔案傳遞。如果從檔案中讀取，每個測試案例必須位於單獨的行上。行可以以換行符n或回車符和換行符rn結束。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "a" , "aa" , "aaa" ] ) . build ( ) ;
assert_eq ! ( regexp , "^a(?:aa?)?$" ) ;

5.2.2 轉換為字元類

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "a" , "aa" , "123" ] )
    . with_conversion_of_digits ( )
    . with_conversion_of_words ( )
    . build ( ) ;
assert_eq ! ( regexp , "^( \ d \ d \ d| \ w(?: \ w)?)$" ) ;

5.2.3 轉換重複子字串

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "aa" , "bcbc" , "defdefdef" ] )
    . with_conversion_of_repetitions ( )
    . build ( ) ;
assert_eq ! ( regexp , "^(?:a{2}|(?:bc){2}|(?:def){3})$" ) ;

預設情況下， grex以這種方式轉換每個子字串，該子字串至少是一個字元長，並且隨後至少重複一次。如果您願意，您可以自訂這兩個參數。

在下面的範例中，測試案例aa不會轉換為a{2} ，因為重複子字串a長度為 1，但最小子字串長度已設定為 2。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "aa" , "bcbc" , "defdefdef" ] )
    . with_conversion_of_repetitions ( )
    . with_minimum_substring_length ( 2 )
    . build ( ) ;
assert_eq ! ( regexp , "^(?:aa|(?:bc){2}|(?:def){3})$" ) ;

在下一個範例中設定最小重複次數為 2 次，只有測試案例defdefdef才會轉換，因為它是唯一重複兩次的測試案例。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "aa" , "bcbc" , "defdefdef" ] )
    . with_conversion_of_repetitions ( )
    . with_minimum_repetitions ( 2 )
    . build ( ) ;
assert_eq ! ( regexp , "^(?:bcbc|aa|(?:def){3})$" ) ;

5.2.4 轉義非ascii字符

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "You smell like ?." ] )
    . with_escaping_of_non_ascii_chars ( false )
    . build ( ) ;
assert_eq ! ( regexp , "^You smell like \ u{1f4a9} \ .$" ) ;

舊版的 JavaScript 不支援星體程式碼平面的 unicode 轉義序列（範圍U+010000到U+10FFFF ）。為了在 JavaScript 正規表示式中支援這些符號，需要轉換為代理對。有關此事的更多資訊可以在這裡找到。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "You smell like ?." ] )
    . with_escaped_non_ascii_chars ( true )
    . build ( ) ;
assert_eq ! ( regexp , "^You smell like \ u{d83d} \ u{dca9} \ .$" ) ;

5.2.5 不區分大小寫的匹配

grex產生的正規表示式預設區分大小寫。可以像這樣啟用不區分大小寫的匹配：

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "big" , "BIGGER" ] )
    . with_case_insensitive_matching ( )
    . build ( ) ;
assert_eq ! ( regexp , "(?i)^big(?:ger)?$" ) ;

5.2.6 捕獲組

預設情況下使用非捕獲組。擴展前面的範例，您可以切換到捕獲組。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "big" , "BIGGER" ] )
    . with_case_insensitive_matching ( )
    . with_capturing_groups ( )
    . build ( ) ;
assert_eq ! ( regexp , "(?i)^big(ger)?$" ) ;

5.2.7 詳細模式

如果您發現產生的正規表示式難以閱讀，可以啟用詳細模式。然後將表情分成多行並縮進，使其更悅目。

 use grex :: RegExpBuilder ;
use indoc :: indoc ;

let regexp = RegExpBuilder :: from ( & [ "a" , "b" , "bcd" ] )
    . with_verbose_mode ( )
    . build ( ) ;

assert_eq ! ( regexp , indoc! (
    r#"
    (?x)
    ^
      (?:
        b
        (?:
          cd
        )?
        |
        a
      )
    $"#
) ) ;

5.2.8 禁用錨點

預設情況下，錨點^和$放置在每個產生的正規表示式周圍，以確保它僅匹配作為輸入給出的測試案例。然而，通常希望將產生的模式用作更大模式的一部分。為此，可以單獨或同時停用錨點。

 use grex :: RegExpBuilder ;

let regexp = RegExpBuilder :: from ( & [ "a" , "aa" , "aaa" ] )
    . without_anchors ( )
    . build ( ) ;
assert_eq ! ( regexp , "a(?:aa?)?" ) ;

5.3 範例

以下範例顯示了各種受支援的正規表示式語法功能：

$ grex a b c
^[a-c]$

$ grex a c d e f
^[ac-f]$

$ grex a b x de
^( ? :de | [abx])$

$ grex abc bc
^a ? bc$

$ grex a b bc
^( ? :bc ? | a)$

$ grex [a-z]
^ [ a - z ] $

$ grex -r b ba baa baaa
^b( ? :a{1,3}) ? $

$ grex -r b ba baa baaaa
^b( ? :a{1,2} | a{4}) ? $

$ grex y̆ a z
^( ? :y̆ | [az])$
Note: 
Grapheme y̆ consists of two Unicode symbols:
U+0079 (Latin Small Letter Y)
U+0306 (Combining Breve)

$ grex " I ♥ cake " " I ♥ cookies "
^I ♥ c( ? :ookies | ake)$
Note:
Input containing blank space must be 
surrounded by quotation marks.

字串"I ♥♥♥ 36 and ٣ and ??."使用命令列符號作為以下範例的輸入：

$ grex < INPUT >
^I ♥♥♥ 36 and ٣ and ?? . $

$ grex -e < INPUT >
^I u {2665} u {2665} u {2665} 36 and u {663} and u {1f4a9} u {1f4a9} . $

$ grex -e --with-surrogates < INPUT >
^I u {2665} u {2665} u {2665} 36 and u {663} and u {d83d} u {dca9} u {d83d} u {dca9} . $

$ grex -d < INPUT >
^I ♥♥♥ dd and d and ?? . $

$ grex -s < INPUT >
^I s ♥♥♥ s 36 s and s ٣ s and s ?? . $

$ grex -w < INPUT >
^ w ♥♥♥ ww www w www ?? . $

$ grex -D < INPUT >
^ DDDDDD 36 DDDDD ٣ DDDDDDDD $

$ grex -S < INPUT >
^ S SSS SS SSS S SSS SSS $

$ grex -dsw < INPUT >
^ ws ♥♥♥ sddswwwsdswwws ?? . $

$ grex -dswW < INPUT >
^ wsWWWsddswwwsdswwwsWWW $

$ grex -r < INPUT >
^I ♥{3} 36 and ٣ and ?{2} . $

$ grex -er < INPUT >
^I u {2665}{3} 36 and u {663} and u {1f4a9}{2} . $

$ grex -er --with-surrogates < INPUT >
^I u {2665}{3} 36 and u {663} and ( ? : u {d83d} u {dca9}){2} . $

$ grex -dgr < INPUT >
^I ♥{3} d ( d and ){2}?{2} . $

$ grex -rs < INPUT >
^I s ♥{3} s 36 s and s ٣ s and s ?{2} . $

$ grex -rw < INPUT >
^ w ♥{3} w ( ? : w w {3} ){2}?{2} . $

$ grex -Dr < INPUT >
^ D {6}36 D {5}٣ D {8}$

$ grex -rS < INPUT >
^ S S ( ? : S {2} ){2} S {3} S S {3} S {3}$

$ grex -rW < INPUT >
^I W {5}36 W and W ٣ W and W {4}$

$ grex -drsw < INPUT >
^ ws ♥{3} sd ( ? : dsw {3} s ){2}?{2} . $

$ grex -drswW < INPUT >
^ wsW {3} sd ( ? : dsw {3} s ){2} W {3}$

6、如何搭建？

為了自行建立原始程式碼，您需要在電腦上安裝穩定的 Rust 工具鏈，以便 Rust 套件管理器Cargo可用。請注意：建構 CLI 需要 Rust >= 1.70.0。對於庫部分，Rust < 1.70.0 就足夠了。

git clone https://github.com/pemistahl/grex.git
cd grex
cargo build

原始碼附帶一個廣泛的測試套件，包括單元測試、整合測試和屬性測試。要運行它們，只需說：

cargo test

可以透過以下方式運行測量多種設定效能的基準：

cargo bench

7.Python擴充模組

在 PyO3 和 Maturin 的幫助下，該程式庫已被編譯為 Python 擴充模組，以便它也可以在任何 Python 軟體中使用。它可以在 Python 套件索引中找到，並且可以透過以下方式安裝：

pip install grex

要自行建立 Python 擴充模組，請建立一個虛擬環境並安裝 Maturin。

python -m venv /path/to/virtual/environment
source /path/to/virtual/environment/bin/activate
pip install maturin
maturin build

Python 庫包含一個名為RegExpBuilder的類，可以像這樣導入：

 from grex import RegExpBuilder

8.WebAssembly 支持

該程式庫可以編譯為 WebAssembly (WASM)，它允許在任何基於 JavaScript 的專案中使用grex ，無論是在瀏覽器中還是在 Node.js 上運行的後端中。

最簡單的編譯方法就是使用wasm-pack 。安裝後，您可以使用 Web 目標建置庫，以便可以直接在瀏覽器中使用：

 wasm-pack build --target web

這會在此儲存庫的頂層建立一個名為pkg的目錄，其中包含已編譯的 wasm 檔案以及 JavaScript 和 TypeScript 綁定。在 HTML 檔案中，您可以像下面這樣呼叫grex ，例如：

 < script type =" module " >
    import init , { RegExpBuilder } from "./pkg/grex.js" ;

    init ( ) . then ( _ => {
        alert ( RegExpBuilder . from ( [ "hello" , "world" ] ) . build ( ) ) ;
    } ) ;
</ script >

還有一些適用於 Node.js 以及 Chrome、Firefox 和 Safari 瀏覽器的整合測試。要運行它們，只需說：

 wasm-pack test --node --headless --chrome --firefox --safari

如果測試無法在 Safari 中啟動，您需要先執行以下命令來啟用 Safari 的 Web 驅動程式：

 sudo safaridriver --enable

wasm-pack的輸出將託管在一個單獨的儲存庫中，該儲存庫允許添加更多與 JavaScript 相關的配置、測試和文件。然後grex也會被加入到 npm 登錄中，以便在每個 JavaScript 或 TypeScript 專案中輕鬆下載和安裝。

有一個示範網站，您可以嘗試 grex。

9. 它是如何運作的？

根據輸入字串建立確定性有限自動機 (DFA)。
透過應用 Hopcroft 的 DFA 最小化演算法，減少了 DFA 中的狀態數量和狀態之間的轉換。
最小化 DFA 表示為線性方程組，以 Brzozowski 代數法解，得到最終的正規表示式。

10. 1.5.0 版本的下一步是什麼？

看看計劃中的問題。

11. 貢獻

如果您想為grex做出一些貢獻，我鼓勵您這樣做。您對酷炫功能有什麼想法嗎？或者到目前為止您發現任何錯誤嗎？請隨意提出問題或發送拉取請求。非常感謝。 :-)

展開

grex

1.這個工具有什麼作用？

2. 那我還需要學習寫正規表示式嗎？

3. 目前的特點

4.如何安裝？

4.1 命令列工具

4.2 圖書館

5.如何使用？

5.1 命令列工具

5.2 圖書館

5.2.1 預設設定

5.2.2 轉換為字元類

5.2.3 轉換重複子字串

5.2.4 轉義非ascii字符

5.2.5 不區分大小寫的匹配

5.2.6 捕獲組

5.2.7 詳細模式

5.2.8 禁用錨點

5.3 範例

6、如何搭建？

7.Python擴充模組

8.WebAssembly 支持

9. 它是如何運作的？

10. 1.5.0 版本的下一步是什麼？

11. 貢獻

Google Blog Converters(部落格資料轉換器)

Nuitka

smartchart資料視覺化平台v6.9

azure storage python

Redash開源的資料圖表工具v24.10.0

datamule python

chat.petals.dev

GPT Prompt Templates

GPTyped

Google Blog Converters(部落格資料轉換器)

Nuitka

smartchart資料視覺化平台v6.9

waymo open dataset

termwind

wp functions