redi_search下載 - redi_search原始碼下載

重迪搜尋

一個簡單但功能強大的 RediSearch 的 Ruby 包裝器，RediSearch 是一個基於 Redis 的搜尋引擎。

安裝

首先需要安裝Redis和RediSearch。

您可以從 https://redis.io/download 下載 Redis，並在此處查看安裝說明。或者，在 macOS 或 Linux 上，您可以透過 Homebrew 安裝。

要安裝 RediSearch，請查看 https://oss.redislabs.com/redisearch/Quick_Start.html。建置 RediSearch 後，如果您不使用 Docker，則可以更新 redis.conf 檔案以始終使用loadmodule /path/to/redisearch.so載入 RediSearch 模組。（在 macOS 上，redis.conf 檔案可以在/usr/local/etc/redis.conf中找到）

Redis 和 RediSearch 啟動並運行後，將以下行新增至您的 Gemfile 中：

 gem 'redi_search'

進而：

❯ bundle

或自己安裝：

❯ gem install redi_search

並要求它：

 require 'redi_search'

安裝並需要 gem 後，您需要使用 Redis 配置來設定它。如果您使用 Rails，這應該放在初始化程序中 ( config/initializers/redi_search.rb )。

 RediSearch . configure do | config |
  config . redis_config = {
    host : "127.0.0.1" ,
    port : "6379"
  }
end

前言

RediSearch 圍繞著搜尋索引展開，因此讓我們從定義搜尋索引開始。根據斯威夫類型：

搜尋索引是搜尋引擎在尋找與特定查詢相關的結果時所引用的結構化資料體。索引是任何搜尋系統的關鍵部分，因為它們必須適合搜尋引擎演算法的特定資訊檢索方法。這樣一來，演算法和索引就密不可分地連結在一起了。索引也可以用作動詞（索引），指的是以專為搜尋引擎演算法定制的結構化格式收集非結構化網站資料的過程。
考慮索引的一種方法是考慮搜尋基礎設施和辦公室文件系統之間的以下類比。想像一下，你遞給實習生一疊數千張紙（文件），並告訴他們將這些紙整理到文件櫃（索引）中，以幫助公司更有效地查找資訊。實習生首先必須對文件進行分類並了解其中包含的所有信息，然後他們必須決定將文件安排在文件櫃中的系統，最後他們需要決定什麼是文件文件放入內閣後，進行搜索和選擇的最有效方式。在這個例子中，組織和歸檔論文的過程對應於索引網站內容的過程，而搜尋這些組織好的文件並找到最相關的文件的方法對應於搜尋演算法。

模式

這定義了索引中的欄位和這些欄位的屬性。模式是一個簡單的 DSL。每個欄位可以是四種類型之一：地理、數字、標籤或文本，並且可以有許多選項。模式的一個簡單範例是：

 RediSearch :: Schema . new do
  text_field :first_name
  text_field :last_name
end

每種類型支援的選項如下：

文字欄位

沒有選項： text_field :name

選項

權重（預設值：1.0）
- 聲明計算結果準確性時該欄位的重要性。這是一個乘數。
- 例如： text_field :name, weight: 2
語音的
- 預設情況下將在搜尋欄位中執行拼音匹配。必需的 {matcher} 參數指定所使用的語音演算法和語言。支援以下匹配器：
  - dm:en - 英語雙變音位
  - dm:fr - 法文的雙變音位
  - dm:pt - 葡萄牙語雙變音位
  - dm:es - 西班牙語雙變音位
- 例如： text_field :name, phonetic: 'dm:en'
可排序（預設值： false）
- 允許使用者稍後按此欄位的值對結果進行排序（這會增加記憶體開銷，因此不要在大文字欄位上聲明它）。
- 例如： text_field :name, sortable: true
no_index （預設值：false）
- 欄位不會被索引。這與sortable結合使用非常有用，可以建立使用 PARTIAL 進行更新不會導致文件完全重新索引的欄位。如果一個欄位沒有no_index且沒有sortable ，它只會被索引忽略。
- 例如： text_field :name, no_index: true
no_stem （預設值：false）
- 在為其值建立索引時禁用詞幹提取。這對於專有名稱之類的東西可能是理想的。
- 例如： text_feidl :name, no_stem: true

數位字段

沒有選項： numeric_field :price

選項

可排序（預設值： false）
- 允許使用者稍後按此欄位的值對結果進行排序（這會增加記憶體開銷，因此不要在大文字欄位上聲明它）。
- 例如： numeric_field :id, sortable: true
no_index （預設值：false）
- 欄位不會被索引。這與sortable結合使用非常有用，可以建立使用 PARTIAL 進行更新不會導致文件完全重新索引的欄位。如果一個欄位沒有no_index且沒有sortable ，它只會被索引忽略。
- 例如： numeric_field :id, no_index: true

標籤字段

沒有選項： tag_field :tag

選項

可排序（預設值： false）
- 允許使用者稍後按此欄位的值對結果進行排序（這會增加記憶體開銷，因此不要在大文字欄位上聲明它）。
- 例如： tag_field :tag, sortable: true
no_index （預設值：false）
- 欄位不會被索引。這與sortable結合使用非常有用，可以建立使用 PARTIAL 進行更新不會導致文件完全重新索引的欄位。如果一個欄位沒有no_index且沒有sortable ，它只會被索引忽略。
- 例如： tag_field :tag, no_index: true
分隔符號（預設：“,”）
- 指示如何將欄位中包含的文字拆分為單獨的標籤。預設值為 ,。該值必須是單一字元。
- 例如： tag_field :tag, separator: ','

地質場

沒有選項： geo_field :place

選項

可排序（預設值： false）
- 允許使用者稍後按此欄位的值對結果進行排序（這會增加記憶體開銷，因此不要在大文字欄位上聲明它）。
- 例如： geo_field :place, sortable: true
no_index （預設值：false）
- 欄位不會被索引。這與sortable結合使用非常有用，可以建立使用 PARTIAL 進行更新不會導致文件完全重新索引的欄位。如果一個欄位沒有no_index且沒有sortable ，它只會被索引忽略。
- 例如： geo_field :place, no_index: true

文件

Document是 Redis 哈希值的 Ruby 表示形式。

您可以使用.get類別方法取得Document 。

get(index, document_id)取得給定document_id的Index中的單一Document 。

您也可以使用.for_object(index, record, only: [])類別方法建立一個Document實例。它需要一個Index實例和一個 Ruby 物件。該物件必須回應Index Schema中指定的所有欄位。 only接受架構中的字段數組並限制傳遞到Document字段。

一旦您擁有了Document的實例，它就會回應Index Schema中指定的所有字段，例如methods 和document_id 。 document_id會自動新增Index的名稱，除非它已經是為了確保唯一性。我們在索引名稱前面加上Index名稱是因為如果您在不同的Index中有兩個具有相同 id 的Document ，我們不希望Document相互覆蓋。還有一個#document_id_without_index方法可以刪除前置索引名稱。

最後有一個#del方法可以從Index中刪除Document 。

指數

若要初始化Index ，請將Index的名稱作為字串或符號以及Schema塊傳遞。

 RediSearch :: Index . new ( name_of_index ) do
  text_field :foobar
end

可用指令

create
- 在Redis實例中建立索引，傳回一個布林值。有一個附帶的 bang 方法，失敗時會引發異常。如果索引已經存在則傳回false 。接受幾個選項：
  - max_text_fields: #{true || false}
    - 為了提高效率，如果索引是使用少於 32 個文字欄位建立的，則 RediSearch 會對索引進行不同的編碼。此選項強制 RediSearch 對索引進行編碼，就像有超過 32 個文字欄位一樣，這允許您使用add_field添加其他欄位（超過 32 個）。
  - no_offsets: #{true || false}
    - 如果設置，我們不會儲存文件的術語偏移量（節省內存，不允許精確搜尋或突出顯示）。意味著no_highlight 。
  - temporary: #{seconds}
    - 建立一個輕量級臨時索引，該索引將在seconds不活動後過期。每當搜尋或新增索引時，內部空閒計時器都會重設。由於此類索引是輕量級的，因此您可以建立數千個此類索引，而不會產生負面效能影響。
  - no_highlight: #{true || false}
    - 透過停用突出顯示支援來節省儲存空間和記憶體。如果設置，我們不會儲存術語位置的相應位元組偏移量。 no_offsets也暗示了no_highlight 。
  - no_fields: #{true || false}
    - 如果設置，我們不會儲存每個術語的字段位。節省內存，不允許按特定字段進行過濾。
  - no_frequencies: #{true || false}
    - 如果設置，我們將避免在索引中保存術語頻率。這可以節省內存，但不允許根據文件中給定術語的頻率進行排序。
drop(keep_docs: false)
- 從 Redis 實例中刪除Index ，傳回一個布林值。有一個附帶的 bang 方法，失敗時會引發異常。如果Index已被刪除，將傳回false 。採用選項關鍵字參數keep_docs ，預設將刪除 Redis 中的所有文件雜湊值。
exist?
- 傳回一個布林值，表示Index存在。
info
- 傳回一個 struct 對象，其中包含有關Index的所有資訊。
fields
- 傳回Index中欄位名稱的陣列。
add(document)
- 取得一個Document物件。有一個附帶的 bang 方法，失敗時會引發異常。
add_multiple(documents)
- 取得Document的數組。這提供了一種將多個文件添加到Index的更高效的方法。接受與add相同的選項。
del(document)
- 從Index中刪除Document 。
document_count
- 返回Index中Document的數量
add_field(name, type, **options, &block)
- 向Index新增一個欄位。
- 區塊和選項是可選的。
- 例如： index.add_field(:first_name, :text, phonetic: "dm:en")
reindex(documents, recreate: false)
- 如果recreate為true Index將被刪除並重新創建

搜尋中

搜尋是從RediSearch::Index實例開始的，其中的子句可以連結在一起。搜尋時，傳回一個Document數組，其中包含所有模式欄位的公共讀取器方法。

 main ❯ index = RediSearch :: Index . new ( "user_idx" ) { text_field :name , phonetic : "dm:en" }
main ❯ index . add RediSearch :: Document . for_object ( index , User . new ( "10039" , "Gene" , "Volkman" ) )
main ❯ index . add RediSearch :: Document . for_object ( index , User . new ( "9998" , "Jeannie" , "Ledner" ) )
main ❯ index . search ( "john" )
  RediSearch ( 1.1 ms )  FT . SEARCH user_idx `john`
=> [ #<RediSearch::Document:0x00007f862e241b78 first: "Gene", last: "Volkman", document_id: "10039">,
#<RediSearch::Document:0x00007f862e2417b8 first: "Jeannie", last: "Ledner", document_id: "9998">]

簡單片語查詢- hello AND world

 index . search ( "hello" ) . and ( "world" )

精確短語查詢- hello FOLLOWED BY world

 index . search ( "hello world" )

聯合查詢- hello OR world

 index . search ( "hello" ) . or ( "world" )

否定查詢- hello AND NOT world

 index . search ( "hello" ) . and . not ( "world" )

複雜的交集和並集：

 # Intersection of unions
index . search ( index . search ( "hello" ) . or ( "halo" ) ) . and ( index . search ( "world" ) . or ( "werld" ) )
# Negation of union
index . search ( "hello" ) . and . not ( index . search ( "world" ) . or ( "werld" ) )
# Union inside phrase
index . search ( "hello" ) . and ( index . search ( "world" ) . or ( "werld" ) )

所有條款都支援一些可以應用的選項。

前綴術語：匹配以前綴開頭的所有術語。（ like term% ）

 index . search ( "hel" , prefix : true )
index . search ( "hello worl" , prefix : true )
index . search ( "hel" , prefix : true ) . and ( "worl" , prefix : true )
index . search ( "hello" ) . and . not ( "worl" , prefix : true )

可選條款：包含可選條款的文件將比沒有可選條款的文件排名更高

 index . search ( "foo" ) . and ( "bar" , optional : true ) . and ( "baz" , optional : true )

模糊術語：根據編輯距離 (LD) 進行配對。支援的最大編輯距離為 3。

 index . search ( "zuchini" , fuzziness : 1 )

也可以使用where子句將搜尋詞範圍限定為特定欄位：

 # Simple field specific query
index . search . where ( name : "john" )
# Using where with options
index . search . where ( first : "jon" , fuzziness : 1 )
# Using where with more complex query
index . search . where ( first : index . search ( "bill" ) . or ( "bob" ) )

搜尋數字欄位需要一個範圍：

 index . search . where ( number : 0 .. 100 )
# Searching to infinity
index . search . where ( number : 0 .. Float :: INFINITY )
index . search . where ( number : - Float :: INFINITY .. 0 )

查詢層級子句

slop(level)
- 我們允許短語術語之間最多存在 N 個不匹配的偏移量。（即精確短語的斜率為 0）
in_order
- 通常與slop結合使用。我們確保查詢詞在Document中的顯示順序與查詢中的順序相同，無論它們之間的偏移量如何。
no_content
- 只回傳Document ID，不回傳內容。如果在 Rails 模型上使用 RediSearch，其中Document屬性並不重要且它被轉換為ActiveRecord對象，那麼這非常有用。
language(language)
- 在搜尋查詢擴充期間用於提供的語言的詞幹分析器。如果查詢中文的Document ，則應將其設為 chinese ，以便正確標記查詢術語。如果發送不支援的語言，該命令將傳回錯誤。
sort_by(field, order: :asc)
- 如果提供的字段是可排序字段，則結果會按該字段的值排序。這適用於文字和數字欄位。可用指令為:asc或:desc
limit(num, offset = 0)
- 將結果限制為offset處的指定num 。預設限制設定為10 。
count
- 傳回在搜尋查詢中找到的Document的數量
highlight(fields: [], opening_tag: "<b>", closing_tag: "</b>")
- 使用此選項可設定符合文字出現的格式。 fields是要反白的欄位的陣列。
verbatim
- 不要嘗試使用詞幹來擴展查詢，而是逐字搜尋查詢詞。
no_stop_words
- 不要從查詢中過濾停用詞。
with_scores
- 包括每個Document的相對內部分數。這可用於合併多個實例的結果。這將為傳回的Document實例新增一個score方法。
return(*fields)
- 限制傳回Document中的哪些欄位。
explain
- 傳回複雜查詢的執行計劃。在回傳的回應中，術語上的 + 表示詞幹。

拼字檢查

拼字檢查是從RediSearch::Index實例啟動的，並為拼字錯誤的搜尋字詞提供建議。它需要一個可選的distance參數，它是拼字建議的最大編輯距離。它會傳回一個數組，其中每個元素包含每個搜尋字詞的建議以及基於其在索引中出現的標準化分數。

 main ❯ index = RediSearch :: Index . new ( "user_idx" ) { text_field :name , phonetic : "dm:en" }
main ❯ index . spellcheck ( "jimy" )
  RediSearch ( 1.1 ms )  FT . SPELLCHECK user_idx jimy DISTANCE 1
  => [ #<RediSearch::Spellcheck::Result:0x00007f805591c670
    term : "jimy" ,
    suggestions :
     [ #<struct RediSearch::Spellcheck::Suggestion score=0.0006849315068493151, suggestion="jimmy">,
      #<struct RediSearch::Spellcheck::Suggestion score=0.00019569471624266145, suggestion="jim">]>]
main ❯ index . spellcheck ( "jimy" , distance : 2 ) . first . suggestions
  RediSearch ( 0.5 ms )  FT . SPELLCHECK user_idx jimy DISTANCE 2
=> [ #<struct RediSearch::Spellcheck::Suggestion score=0.0006849315068493151, suggestion="jimmy">,
 #<struct RediSearch::Spellcheck::Suggestion score=0.00019569471624266145, suggestion="jim">]

軌道集成

與 Rails 整合非常簡單！從模型內部使用schema關鍵字參數呼叫redi_search 。前任：

 class User < ApplicationRecord
  redi_search do
    text_field :first , phonetic : "dm:en"
    text_field :last , phonetic : "dm:en"
  end
end

這將自動新增User.search和User.spellcheck方法，其行為與在Index實例上呼叫它們相同。

也加入了User.reindex(recreate: false, only: [])其行為與RediSearch::Index#reindex類似。其中一些差異包括：

Document不需要作為第一個參數傳遞。 search_import作用域會自動調用，所有記錄都會轉換為Document 。
接受一個可選only ，您可以在其中指定要更新的有限數量的欄位。如果您更改架構並且只需要對特定欄位建立索引，則非常有用。

在定義模式時，您可以選擇向其傳遞一個區塊。如果沒有傳遞任何區塊， name將在模型上呼叫以取得值。如果傳遞了一個區塊，則透過呼叫該區塊來取得該欄位的值。

 class User < ApplicationRecord
  redi_search do
    text_field :name do
      " #{ first_name } #{ last_name } "
    end
  end
end

您可以在建立索引時覆蓋模型上的search_import範圍以預先載入關係，或者它可用於限制要建立索引的記錄。

 class User < ApplicationRecord
  scope :search_import , -> { includes ( :posts ) }
end

搜尋時，預設會傳回Document的集合。對搜尋查詢呼叫#results將執行搜索，然後在資料庫中找到所有找到的記錄並傳回 ActiveRecord 關係。

模型Index的預設Index名稱是#{model_name.plural}_#{RediSearch.env} 。 redi_search方法採用一個可選的index_prefix參數，該參數被加入到索引名稱之前：

 class User < ApplicationRecord
  redi_search index_prefix : 'prefix' do
    text_field :first , phonetic : "dm:en"
    text_field :last , phonetic : "dm:en"
  end
end

User . search_index . name
# => prefix_users_development

將 RediSearch 整合到模型中時，記錄將在建立和更新後自動建立索引，並在銷毀時從Index中刪除。

還有一些更方便公開的方法：

search_document
- 以RediSearch::Document實例的形式傳回記錄
remove_from_index
- 從Index中刪除記錄
add_to_index
- 將記錄新增至Index
search_index
- 傳回RediSearch::Index實例

發展

查看儲存庫後，執行bin/setup以安裝相依性。然後，執行rake test以執行單元測試和整合測試。要單獨運行它們，您可以運行rake test:unit或rake test:integration 。您也可以執行bin/console以獲得互動式提示，以便您進行實驗。

若要將此 gem 安裝到本機上，請執行bundle exec rake install 。要發布新版本，請執行bin/publish (major|minor|patch)這將更新version.rb中的版本號，為該版本建立 git 標籤，推送 git 提交和標籤，並將.gem檔案推送到ruby gems .org 和 GitHub。