redi_search下载 - redi_search源码下载

重迪搜索

一个简单但功能强大的 RediSearch 的 Ruby 包装器，RediSearch 是一个基于 Redis 的搜索引擎。

安装

首先需要安装Redis和RediSearch。

您可以从 https://redis.io/download 下载 Redis，并在此处查看安装说明。或者，在 macOS 或 Linux 上，您可以通过 Homebrew 安装。

要安装 RediSearch，请查看 https://oss.redislabs.com/redisearch/Quick_Start.html。构建 RediSearch 后，如果您不使用 Docker，则可以更新 redis.conf 文件以始终使用loadmodule /path/to/redisearch.so加载 RediSearch 模块。（在 macOS 上，redis.conf 文件可以在/usr/local/etc/redis.conf中找到）

Redis 和 RediSearch 启动并运行后，将以下行添加到您的 Gemfile 中：

 gem 'redi_search'

进而：

❯ bundle

或者自己安装：

❯ gem install redi_search

并要求它：

 require 'redi_search'

安装并需要 gem 后，您需要使用 Redis 配置来配置它。如果您使用 Rails，这应该放在初始化程序中 ( config/initializers/redi_search.rb )。

 RediSearch . configure do | config |
  config . redis_config = {
    host : "127.0.0.1" ,
    port : "6379"
  }
end

前言

RediSearch 围绕搜索索引展开，因此让我们从定义搜索索引开始。根据斯威夫类型：

搜索索引是搜索引擎在查找与特定查询相关的结果时引用的结构化数据体。索引是任何搜索系统的关键部分，因为它们必须适合搜索引擎算法的特定信息检索方法。这样一来，算法和索引就密不可分了。索引也可以用作动词（索引），指的是以专为搜索引擎算法定制的结构化格式收集非结构化网站数据的过程。
考虑索引的一种方法是考虑搜索基础设施和办公文件系统之间的以下类比。想象一下，你递给实习生一叠数千张纸（文档），并告诉他们将这些纸整理在文件柜（索引）中，以帮助公司更有效地查找信息。实习生首先必须对文件进行分类并了解其中包含的所有信息，然后他们必须决定将文件安排在文件柜中的系统，最后他们需要决定什么是文件文件放入内阁后，进行搜索和选择的最有效方式。在这个例子中，组织和归档论文的过程对应于索引网站内容的过程，而搜索这些组织好的文件并找到最相关的文件的方法对应于搜索算法。

模式

这定义了索引中的字段和这些字段的属性。模式是一个简单的 DSL。每个字段可以是四种类型之一：地理、数字、标签或文本，并且可以有许多选项。模式的一个简单示例是：

 RediSearch :: Schema . new do
  text_field :first_name
  text_field :last_name
end

每种类型支持的选项如下：

文本字段

没有选项： text_field :name

选项

权重（默认值：1.0）
- 声明计算结果准确性时该字段的重要性。这是一个乘数。
- 例如： text_field :name, weight: 2
语音的
- 默认情况下将在搜索中对字段执行拼音匹配。必需的 {matcher} 参数指定所使用的语音算法和语言。支持以下匹配器：
  - dm:en - 英语双变音位
  - dm:fr - 法语的双变音位
  - dm:pt - 葡萄牙语的双变音位
  - dm:es - 西班牙语双变音位
- 例如： text_field :name, phonetic: 'dm:en'
可排序（默认值： false）
- 允许用户稍后按此字段的值对结果进行排序（这会增加内存开销，因此不要在大文本字段上声明它）。
- 例如： text_field :name, sortable: true
no_index （默认值：false）
- 字段不会被索引。这与sortable结合使用非常有用，可以创建使用 PARTIAL 更新不会导致文档完全重新索引的字段。如果一个字段没有no_index并且没有sortable ，它只会被索引忽略。
- 例如： text_field :name, no_index: true
no_stem （默认值：false）
- 在为其值建立索引时禁用词干提取。这对于专有名称之类的东西可能是理想的。
- 例如： text_feidl :name, no_stem: true

数字字段

没有选项： numeric_field :price

选项

可排序（默认值： false）
- 允许用户稍后按此字段的值对结果进行排序（这会增加内存开销，因此不要在大文本字段上声明它）。
- 例如： numeric_field :id, sortable: true
no_index （默认值：false）
- 字段不会被索引。这与sortable结合使用非常有用，可以创建使用 PARTIAL 进行更新不会导致文档完全重新索引的字段。如果一个字段没有no_index并且没有sortable ，它只会被索引忽略。
- 例如： numeric_field :id, no_index: true

标签字段

没有选项： tag_field :tag

选项

可排序（默认值： false）
- 允许用户稍后按此字段的值对结果进行排序（这会增加内存开销，因此不要在大文本字段上声明它）。
- 例如： tag_field :tag, sortable: true
no_index （默认值：false）
- 字段不会被索引。这与sortable结合使用非常有用，可以创建使用 PARTIAL 进行更新不会导致文档完全重新索引的字段。如果一个字段没有no_index并且没有sortable ，它只会被索引忽略。
- 例如： tag_field :tag, no_index: true
分隔符（默认：“,”）
- 指示如何将字段中包含的文本拆分为单独的标签。默认值为 ,。该值必须是单个字符。
- 例如： tag_field :tag, separator: ','

地质场

没有选项： geo_field :place

选项

可排序（默认值： false）
- 允许用户稍后按此字段的值对结果进行排序（这会增加内存开销，因此不要在大文本字段上声明它）。
- 例如： geo_field :place, sortable: true
no_index （默认值：false）
- 字段不会被索引。这与sortable结合使用非常有用，可以创建使用 PARTIAL 进行更新不会导致文档完全重新索引的字段。如果一个字段没有no_index并且没有sortable ，它只会被索引忽略。
- 例如： geo_field :place, no_index: true

文档

Document是 Redis 哈希值的 Ruby 表示形式。

您可以使用.get类方法获取Document 。

get(index, document_id)获取给定document_id的Index中的单个Document 。

您还可以使用.for_object(index, record, only: [])类方法创建一个Document实例。它需要一个Index实例和一个 Ruby 对象。该对象必须响应Index Schema中指定的所有字段。 only接受架构中的字段数组并限制传递到Document字段。

一旦您拥有了Document的实例，它就会响应Index Schema中指定的所有字段，如methods 和document_id 。 document_id会自动添加Index的名称，除非它已经是为了确保唯一性。我们在索引名称前面加上Index名称是因为如果您在不同的Index中有两个具有相同 id 的Document ，我们不希望Document相互覆盖。还有一个#document_id_without_index方法可以删除前置索引名称。

最后有一个#del方法可以从Index中删除Document 。

指数

要初始化Index ，请将Index的名称作为字符串或符号以及Schema块传递。

 RediSearch :: Index . new ( name_of_index ) do
  text_field :foobar
end

可用命令

create
- 在Redis实例中创建索引，返回一个布尔值。有一个附带的 bang 方法，失败时会引发异常。如果索引已经存在则返回false 。接受几个选项：
  - max_text_fields: #{true || false}
    - 为了提高效率，如果索引是使用少于 32 个文本字段创建的，则 RediSearch 会对索引进行不同的编码。此选项强制 RediSearch 对索引进行编码，就像有超过 32 个文本字段一样，这允许您使用add_field添加其他字段（超过 32 个）。
  - no_offsets: #{true || false}
    - 如果设置，我们不会存储文档的术语偏移量（节省内存，不允许精确搜索或突出显示）。意味着no_highlight 。
  - temporary: #{seconds}
    - 创建一个轻量级临时索引，该索引将在seconds不活动后过期。每当搜索或添加索引时，内部空闲计时器都会重置。由于此类索引是轻量级的，因此您可以创建数千个此类索引，而不会产生负面性能影响。
  - no_highlight: #{true || false}
    - 通过禁用突出显示支持来节省存储空间和内存。如果设置，我们不会存储术语位置的相应字节偏移量。 no_offsets也暗示了no_highlight 。
  - no_fields: #{true || false}
    - 如果设置，我们不会存储每个术语的字段位。节省内存，不允许按特定字段进行过滤。
  - no_frequencies: #{true || false}
    - 如果设置，我们将避免在索引中保存术语频率。这可以节省内存，但不允许根据文档中给定术语的频率进行排序。
drop(keep_docs: false)
- 从 Redis 实例中删除Index ，返回一个布尔值。有一个附带的 bang 方法，失败时会引发异常。如果Index已被删除，将返回false 。采用选项关键字参数keep_docs ，默认情况下将删除 Redis 中的所有文档哈希值。
exist?
- 返回一个布尔值，表示Index存在。
info
- 返回一个 struct 对象，其中包含有关Index的所有信息。
fields
- 返回Index中字段名称的数组。
add(document)
- 获取一个Document对象。有一个附带的 bang 方法，失败时会引发异常。
add_multiple(documents)
- 获取Document对象的数组。这提供了一种将多个文档添加到Index的更高效的方法。接受与add相同的选项。
del(document)
- 从Index中删除Document 。
document_count
- 返回Index中Document的数量
add_field(name, type, **options, &block)
- 向Index添加一个新字段。
- 块和选项是可选的。
- 例如： index.add_field(:first_name, :text, phonetic: "dm:en")
reindex(documents, recreate: false)
- 如果recreate为true Index将被删除并重新创建

搜寻中

搜索是从RediSearch::Index实例开始的，其中的子句可以链接在一起。搜索时，返回一个Document数组，其中包含所有模式字段的公共读取器方法。

 main ❯ index = RediSearch :: Index . new ( "user_idx" ) { text_field :name , phonetic : "dm:en" }
main ❯ index . add RediSearch :: Document . for_object ( index , User . new ( "10039" , "Gene" , "Volkman" ) )
main ❯ index . add RediSearch :: Document . for_object ( index , User . new ( "9998" , "Jeannie" , "Ledner" ) )
main ❯ index . search ( "john" )
  RediSearch ( 1.1 ms )  FT . SEARCH user_idx `john`
=> [ #<RediSearch::Document:0x00007f862e241b78 first: "Gene", last: "Volkman", document_id: "10039">,
#<RediSearch::Document:0x00007f862e2417b8 first: "Jeannie", last: "Ledner", document_id: "9998">]

简单短语查询- hello AND world

 index . search ( "hello" ) . and ( "world" )

精确短语查询- hello FOLLOWED BY world

 index . search ( "hello world" )

联合查询- hello OR world

 index . search ( "hello" ) . or ( "world" )

否定查询- hello AND NOT world

 index . search ( "hello" ) . and . not ( "world" )

复杂的交集和并集：

 # Intersection of unions
index . search ( index . search ( "hello" ) . or ( "halo" ) ) . and ( index . search ( "world" ) . or ( "werld" ) )
# Negation of union
index . search ( "hello" ) . and . not ( index . search ( "world" ) . or ( "werld" ) )
# Union inside phrase
index . search ( "hello" ) . and ( index . search ( "world" ) . or ( "werld" ) )

所有条款都支持一些可以应用的选项。

前缀术语：匹配以前缀开头的所有术语。（ like term% ）

 index . search ( "hel" , prefix : true )
index . search ( "hello worl" , prefix : true )
index . search ( "hel" , prefix : true ) . and ( "worl" , prefix : true )
index . search ( "hello" ) . and . not ( "worl" , prefix : true )

可选条款：包含可选条款的文档将比没有可选条款的文档排名更高

 index . search ( "foo" ) . and ( "bar" , optional : true ) . and ( "baz" , optional : true )

模糊术语：根据编辑距离 (LD) 进行匹配。支持的最大编辑距离为 3。

 index . search ( "zuchini" , fuzziness : 1 )

还可以使用where子句将搜索词范围限定为特定字段：

 # Simple field specific query
index . search . where ( name : "john" )
# Using where with options
index . search . where ( first : "jon" , fuzziness : 1 )
# Using where with more complex query
index . search . where ( first : index . search ( "bill" ) . or ( "bob" ) )

搜索数字字段需要一个范围：

 index . search . where ( number : 0 .. 100 )
# Searching to infinity
index . search . where ( number : 0 .. Float :: INFINITY )
index . search . where ( number : - Float :: INFINITY .. 0 )

查询级别子句

slop(level)
- 我们允许短语术语之间最多存在 N 个不匹配的偏移量。（即精确短语的斜率为 0）
in_order
- 通常与slop结合使用。我们确保查询词在Document中的显示顺序与查询中的顺序相同，无论它们之间的偏移量如何。
no_content
- 只返回Document ID，不返回内容。如果在 Rails 模型上使用 RediSearch，其中Document属性并不重要并且它被转换为ActiveRecord对象，那么这非常有用。
language(language)
- 在搜索查询扩展期间用于提供的语言的词干分析器。如果查询中文的Document ，则应将其设置为 chinese ，以便正确标记查询术语。如果发送不支持的语言，该命令将返回错误。
sort_by(field, order: :asc)
- 如果提供的字段是可排序字段，则结果按该字段的值排序。这适用于文本和数字字段。可用命令为:asc或:desc
limit(num, offset = 0)
- 将结果限制为offset处的指定num 。默认限制设置为10 。
count
- 返回在搜索查询中找到的Document的数量
highlight(fields: [], opening_tag: "<b>", closing_tag: "</b>")
- 使用此选项可设置匹配文本出现的格式。 fields是要突出显示的字段数组。
verbatim
- 不要尝试使用词干来扩展查询，而是逐字搜索查询词。
no_stop_words
- 不要从查询中过滤停用词。
with_scores
- 包括每个Document的相对内部分数。这可用于合并多个实例的结果。这将为返回的Document实例添加一个score方法。
return(*fields)
- 限制返回Document中的哪些字段。
explain
- 返回复杂查询的执行计划。在返回的响应中，术语上的 + 表示词干。

拼写检查

拼写检查是从RediSearch::Index实例启动的，并为拼写错误的搜索词提供建议。它需要一个可选的distance参数，它是拼写建议的最大编辑距离。它返回一个数组，其中每个元素包含每个搜索词的建议以及基于其在索引中出现的标准化分数。

 main ❯ index = RediSearch :: Index . new ( "user_idx" ) { text_field :name , phonetic : "dm:en" }
main ❯ index . spellcheck ( "jimy" )
  RediSearch ( 1.1 ms )  FT . SPELLCHECK user_idx jimy DISTANCE 1
  => [ #<RediSearch::Spellcheck::Result:0x00007f805591c670
    term : "jimy" ,
    suggestions :
     [ #<struct RediSearch::Spellcheck::Suggestion score=0.0006849315068493151, suggestion="jimmy">,
      #<struct RediSearch::Spellcheck::Suggestion score=0.00019569471624266145, suggestion="jim">]>]
main ❯ index . spellcheck ( "jimy" , distance : 2 ) . first . suggestions
  RediSearch ( 0.5 ms )  FT . SPELLCHECK user_idx jimy DISTANCE 2
=> [ #<struct RediSearch::Spellcheck::Suggestion score=0.0006849315068493151, suggestion="jimmy">,
 #<struct RediSearch::Spellcheck::Suggestion score=0.00019569471624266145, suggestion="jim">]

轨道集成

与 Rails 集成非常简单！从模型内部使用schema关键字参数调用redi_search 。前任：

 class User < ApplicationRecord
  redi_search do
    text_field :first , phonetic : "dm:en"
    text_field :last , phonetic : "dm:en"
  end
end

这将自动添加User.search和User.spellcheck方法，其行为与在Index实例上调用它们相同。

还添加了User.reindex(recreate: false, only: [])其行为与RediSearch::Index#reindex类似。其中一些差异包括：

Document不需要作为第一个参数传递。 search_import作用域会自动调用，所有记录都会转换为Document 。
接受一个可选only ，您可以在其中指定要更新的有限数量的字段。如果您更改架构并且只需要对特定字段建立索引，则非常有用。

在定义模式时，您可以选择向其传递一个块。如果没有传递任何块， name将在模型上调用以获取值。如果传递了一个块，则通过调用该块来获取该字段的值。

 class User < ApplicationRecord
  redi_search do
    text_field :name do
      " #{ first_name } #{ last_name } "
    end
  end
end

您可以在索引时覆盖模型上的search_import范围以预先加载关系，或者它可用于限制要索引的记录。

 class User < ApplicationRecord
  scope :search_import , -> { includes ( :posts ) }
end

搜索时，默认返回Document的集合。对搜索查询调用#results将执行搜索，然后在数据库中查找所有找到的记录并返回 ActiveRecord 关系。

模型Index的默认Index名称是#{model_name.plural}_#{RediSearch.env} 。 redi_search方法采用一个可选的index_prefix参数，该参数被添加到索引名称之前：

 class User < ApplicationRecord
  redi_search index_prefix : 'prefix' do
    text_field :first , phonetic : "dm:en"
    text_field :last , phonetic : "dm:en"
  end
end

User . search_index . name
# => prefix_users_development

将 RediSearch 集成到模型中时，记录在创建和更新后将自动建立索引，并在销毁时从Index中删除。

还有一些更方便公开的方法：

search_document
- 以RediSearch::Document实例的形式返回记录
remove_from_index
- 从Index中删除记录
add_to_index
- 将记录添加到Index
search_index
- 返回RediSearch::Index实例

发展

查看存储库后，运行bin/setup以安装依赖项。然后，运行rake test以运行单元测试和集成测试。要单独运行它们，您可以运行rake test:unit或rake test:integration 。您还可以运行bin/console以获得交互式提示，以便您进行实验。

要将此 gem 安装到本地计算机上，请运行bundle exec rake install 。要发布新版本，请执行bin/publish (major|minor|patch)这将更新version.rb中的版本号，为该版本创建 git 标签，推送 git 提交和标签，并将.gem文件推送到 rubygems .org 和 GitHub。