liblevenshtein java下載 - liblevenshtein java原始碼下載

liblevenshtein java

JAVA源碼

1.0.0

下載

利布文施泰因

爪哇

用於產生基於編輯自動機的有限狀態換能器的函式庫。

Levenshtein 感測器接受一個查詢術語並傳回字典中與其相距 n 個拼字錯誤以內的所有術語。它們構成了一類高效能（空間和時間）的拼字校正器，當您在提出建議時不需要上下文時，它們可以很好地工作。忘記對字典執行線性掃描來查找與使用者查詢足夠接近的所有術語，使用 Levenshtein 距離或 Damerau-Levenshtein 距離的二次實現，這些嬰兒可以在線性時間內找到字典中的所有術語查詢詞的長度（不是字典的大小，而是查詢詞的長度）。

如果您需要上下文，則將感測器產生的候選項作為起始位置，並將它們插入您用於上下文的任何模型中（例如透過選擇一起出現的可能性最大的術語序列）。

如需快速演示，請造訪此處的 Github 頁面。還有一個命令列介面 liblevenshtein-java-cli。請參閱其 README.md 以了解取得和使用資訊。

該庫目前是用 Java、CoffeeScript 和 JavaScript 編寫的，但我很快就會將其移植到其他語言。如果您希望使用某種特定語言，或希望將其部署到套件管理系統，請告訴我。

分公司

分支	描述
掌握	最新，開發原始碼
發布	最新，發布原始碼
版本 3.x	最新版本 3.x 的發布來源
版本 2.x	最新版本 2.x 的發布來源

專案管理

問題在 waffle.io 上進行管理。下面你會看到我關閉它們的速度的圖表。

請造訪 Bountysource 承諾對持續存在的問題提供支援。

文件

當涉及到文件時，您有多種選擇：

維基百科
Java文檔
原始碼

基本用法：

最低 Java 版本

liblevenshtein 是針對 Java ≥ 1.8 所開發的。它不適用於以前的版本。

安裝

梅文

<依賴關係>
  <groupId>com.github.universal-automata</groupId>
  <artifactId>liblevenshtein</artifactId>
  <版本>3.0.0</版本>
</依賴>

阿帕契建構者

'com.github.universal-automata:liblevenshtein:jar:3.0.0'

阿帕契常春藤

<dependency org="com.github.universal-automata" name="liblevenshtein" rev="3.0.0" />

絕妙的葡萄

@葡萄（
@Grab(group='com.github.universal-automata', module='liblevenshtein', version='3.0.0')
）

搖籃/Grails

編譯 'com.github.universal-automata:liblevenshtein:3.0.0'

斯卡拉SBT

庫依賴項 += "com.github.universal-automata" % "liblevenshtein" % "3.0.0"

萊寧根

[com.github.universal-automata/liblevenshtein「3.0.0」]

git

% git clone --progress [email protected]:universal-automata/liblevenshtein-java.git
Cloning into 'liblevenshtein-java'...
remote: Counting objects: 8117, done.        
remote: Compressing objects: 100% (472/472), done.        
remote: Total 8117 (delta 352), reused 0 (delta 0), pack-reused 7619        
Receiving objects: 100% (8117/8117), 5.52 MiB | 289.00 KiB/s, done.
Resolving deltas: 100% (5366/5366), done.
Checking connectivity... done.

% cd liblevenshtein-java
% git pull --progress
Already up-to-date.

% git fetch --progress --tags
% git checkout --progress 3.0.0
Note: checking out '3.0.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 4f0f172... pushd and popd silently

% git submodule init
% git submodule update

用法

假設您在名為 top-20-most-common-english-words.txt 的純文字檔案中有以下內容（請注意，該檔案每行有一個術語）：

the
be
to
of
and
a
in
that
have
I
it
for
not
on
with
he
as
you
do
at

以下為您提供查詢其內容的方法：

導入java.io.InputStream;導入java.io.OutputStream;導入java.nio.file.Files;導入java.nio.file.Path;導入java.nio.file.Paths;導入com.github.liblevenshtein.collection。 Dictionary.SortedDawg;導入com.github.liblevenshtein.serialization.PlainTextSerializer;導入com.github.liblevenshtein.serialization.ProtobufSerializer;導入com.github.liblevenshtein.serialization.ProtobufSerializer;導入com.github.liblevenshtein.serialization.ProtobufSerializer;導入com.github.liblevenshtein.serialization.ProtobufSerializer;導入com.github.liblevenshtein.serialization。 .github.liblevenshtein.transducer.Candidate;導入com.github.liblevenshtein.transducer.ITransducer;導入com.github.liblevenshtein.transducer.factory.TransducerBuilder;// ...最終SortedDawg 字典最終路徑字典; ( "/path/to/top-20-most-common-english-words.txt");try (final InputStream stream = Files.newInputStream(dictionaryPath)) { // PlainTextSerializer 建構子接受一個可選的布林值，指定
  // 字典是否已經按字典順序升序排序
  // 命令。  如果已排序，則傳遞 true 將最佳化構造
  // 字典；無論字典是否已排序，您都可以傳遞 false
  // 不（如果您不知道是否
  // 字典已排序）。
  最終 Serializer 序列化器 = new PlainTextSerializer(false);  字典=serializer.deserialize(SortedDawg.class,stream);
}最終 ITransducer<Candidate> 感測器 = new TransducerBuilder()
  .dictionary(字典)
  .algorithm(演算法.TRANSPOSITION)
  .defaultMaxDistance(2)
  .includeDistance(真)
  .build();for (final String queryTerm : new String[] {"foo", "bar"}) { System.out.println("+---------------- -------------------------------------------------- -------------");  System.out.printf("| 查詢字的拼字候選: "%s"%n", queryTerm);  System.out.println("+------------------------------------------ -------------------------------------------------- ");  for (最終候選候選：transducer.transduce(queryTerm)) {System.out.printf("| d("%s", "%s") = [%d]%n", queryTerm, Candidate.term() , 候選人.距離());
  }
}// +-------------------------------------------- ---------- ----------------------------------// |查詢字的拼字候選：「foo」// +-------------------------------------- -------------- ------------------------------------ ------------------ // | d("foo", "do") = [2]// | d("foo", "do") = [2]// | d("foo", "of") = [2]// | d("foo", "of") = [2]// | d("foo", "on") = [2]// | d("foo", "on") = [2]// | d("foo", "to") = [2]// | d("foo", "to") = [2]// | d("foo", "for") = [1]// | d("foo", "for") = [1]// | d("foo", "not") = [2]// | d("foo", "not") = [2]// | d("foo", "you") = [2]// +--------------------------------- -------------------------------------------------- -------- // |查詢字的拼字候選：「bar」// +-------------------------------------- -------------- ------------------------------------ ------------------ // | d("bar", "a") = [2]// | d("bar", "a") = [2]// | d("bar", "as") = [2]// | d("bar", "as") = [2]// | d("bar", "at") = [2]// | d("bar", "at") = [2]// | d("bar", "be") = [2]// | d("bar", "be") = [2]// | d("bar", "for") = [2]// ...

如果您想將字典序列化為以後易於閱讀的格式，請執行以下操作：

最終路徑serializedDictionaryPath = Paths.get（“/path/to/top-20-most-common-english-words.protobuf.bytes”）;嘗試（最終OutputStream流= Files.newOutputStream（serializedDictionaryPath））{最終Serializer序列化器=新的 ProtobufSerializer();  序列化器.序列化（字典，流）；
}

然後，您可以稍後閱讀字典，就像閱讀純文字版本一樣：

最終 SortedDawg deserializedDictionary;try (最終 InputStream 流 = Files.newInputStream(serializedDictionaryPath)) { 最終 Serializer 序列化器 = new ProtobufSerializer();  deserializedDictionary = serializer.deserialize(SortedDawg.class, 流);
}

序列化不僅限於字典，您還可以對轉換器進行序列化（反序列化）。

請參閱 wiki 以了解更多詳細資訊。