A simple, but powerful, Ruby wrapper around RediSearch, a search engine on top of Redis.
Firstly, Redis and RediSearch need to be installed.
You can download Redis from https://redis.io/download, and check out installation instructions here. Alternatively, on macOS or Linux you can install via Homebrew.
To install RediSearch check out,
https://oss.redislabs.com/redisearch/Quick_Start.html.
Once you have RediSearch built, if you are not using Docker, you can update your
redis.conf file to always load the RediSearch module with
loadmodule /path/to/redisearch.so
. (On macOS the redis.conf file can be found
at /usr/local/etc/redis.conf
)
After Redis and RediSearch are up and running, add the following line to your Gemfile:
gem 'redi_search'
And then:
❯ bundle
Or install it yourself:
❯ gem install redi_search
and require it:
require 'redi_search'
Once the gem is installed and required you'll need to configure it with your
Redis configuration. If you're on Rails, this should go in an initializer
(config/initializers/redi_search.rb
).
RediSearch.configure do |config|
config.redis_config = {
host: "127.0.0.1",
port: "6379"
}
end
RediSearch revolves around a search index, so lets start with defining what a search index is. According to Swiftype:
A search index is a body of structured data that a search engine refers to when looking for results that are relevant to a specific query. Indexes are a critical piece of any search system, since they must be tailored to the specific information retrieval method of the search engine’s algorithm. In this manner, the algorithm and the index are inextricably linked to one another. Index can also be used as a verb (indexing), referring to the process of collecting unstructured website data in a structured format that is tailored for the search engine algorithm.
One way to think about indices is to consider the following analogy between a search infrastructure and an office filing system. Imagine you hand an intern a stack of thousands of pieces of paper (documents) and tell them to organize these pieces of paper in a filing cabinet (index) to help the company find information more efficiently. The intern will first have to sort through the papers and get a sense of all the information contained within them, then they will have to decide on a system for arranging them in the filing cabinet, then finally they’ll need to decide what is the most effective manner for searching through and selecting from the files once they are in the cabinet. In this example, the process of organizing and filing the papers corresponds to the process of indexing website content, and the method for searching across these organized files and finding those that are most relevant corresponds to the search algorithm.
This defines the fields and the properties of those fields in the index. A schema is a simple DSL. Each field can be one of four types: geo, numeric, tag, or text and can have many options. A simple example of a schema is:
RediSearch::Schema.new do
text_field :first_name
text_field :last_name
end
The supported options for each type are as follows:
With no options: text_field :name
text_field :name, weight: 2
text_field :name, phonetic: 'dm:en'
text_field :name, sortable: true
sortable
, to create fields whose update using PARTIAL will not cause full reindexing of the document. If a field has no_index
and doesn't have sortable
, it will just be ignored by the index.text_field :name, no_index: true
text_feidl :name, no_stem: true
With no options: numeric_field :price
numeric_field :id, sortable: true
sortable
, to create fields whose update using PARTIAL will not cause full reindexing of the document. If a field has no_index
and doesn't have sortable
, it will just be ignored by the index.numeric_field :id, no_index: true
With no options: tag_field :tag
tag_field :tag, sortable: true
sortable
, to create fields whose update using PARTIAL will not cause full reindexing of the document. If a field has no_index
and doesn't have sortable
, it will just be ignored by the index.tag_field :tag, no_index: true
tag_field :tag, separator: ','
With no options: geo_field :place
geo_field :place, sortable: true
sortable
, to create fields whose update using PARTIAL will not cause full reindexing of the document. If a field has no_index
and doesn't have sortable
, it will just be ignored by the index.geo_field :place, no_index: true
A Document
is the Ruby representation of a Redis hash.
You can fetch a Document
using .get
class methods.
get(index, document_id)
fetches a single Document
in an Index
for a
given document_id
.You can also make a Document
instance using the
.for_object(index, record, only: [])
class method. It takes
an Index
instance and a Ruby object. That object must respond to all the
fields specified in the Index
's Schema
. only
accepts an array of fields
from the schema and limits the fields that are passed to the Document
.
Once you have an instance of a Document
, it responds to all the fields
specified in the Index
's Schema
as methods and document_id
. document_id
is automatically prepended with the Index
's names unless it already is to
ensure uniqueness. We prepend the Index
name because if you have two
Document
s with the same id in different Index
s we don't want the Document
s
to override each other. There is also a #document_id_without_index
method
which removes the prepended index name.
Finally there is a #del
method that will remove the Document
from the
Index
.
To initialize an Index
, pass the name of the Index
as a string or symbol
and the Schema
block.
RediSearch::Index.new(name_of_index) do
text_field :foobar
end
create
false
if the index already exists. Accepts a few options:
max_text_fields: #{true || false}
add_field
.no_offsets: #{true || false}
no_highlight
.temporary: #{seconds}
seconds
seconds of inactivity. The internal idle timer is reset whenever the
index is searched or added to. Because such indexes are lightweight,
you can create thousands of such indexes without negative performance
implications.no_highlight: #{true || false}
no_highlight
is also implied by no_offsets
.no_fields: #{true || false}
no_frequencies: #{true || false}
drop(keep_docs: false)
Index
from the Redis instance, returns a boolean. Has an
accompanying bang method that will raise an exception upon failure. Will
return false
if the Index
has already been dropped. Takes an option
keyword arg, keep_docs
, that will by default remove all the document
hashes in Redis.exist?
Index
existence.info
Index
.fields
Index
.add(document)
Document
object. Has an
accompanying bang method that will raise an exception upon failure.add_multiple(documents)
Document
objects. This provides a more performant way to
add multiple documents to the Index
. Accepts the same options as add
.del(document)
Document
from the Index
.document_count
Document
s in the Index
add_field(name, type, **options, &block)
Index
.index.add_field(:first_name, :text, phonetic: "dm:en")
reindex(documents, recreate: false)
recreate
is true
the Index
will be dropped and recreatedSearching is initiated off a RediSearch::Index
instance with clauses that can
be chained together. When searching, an array of Document
s is returned
which has public reader methods for all the schema fields.
main ❯ index = RediSearch::Index.new("user_idx") { text_field :name, phonetic: "dm:en" }
main ❯ index.add RediSearch::Document.for_object(index, User.new("10039", "Gene", "Volkman"))
main ❯ index.add RediSearch::Document.for_object(index, User.new("9998", "Jeannie", "Ledner"))
main ❯ index.search("john")
RediSearch (1.1ms) FT.SEARCH user_idx `john`
=> [#<RediSearch::Document:0x00007f862e241b78 first: "Gene", last: "Volkman", document_id: "10039">,
#<RediSearch::Document:0x00007f862e2417b8 first: "Jeannie", last: "Ledner", document_id: "9998">]
Simple phrase query - hello AND world
index.search("hello").and("world")
Exact phrase query - hello FOLLOWED BY world
index.search("hello world")
Union query - hello OR world
index.search("hello").or("world")
Negation query - hello AND NOT world
index.search("hello").and.not("world")
Complex intersections and unions:
# Intersection of unions
index.search(index.search("hello").or("halo")).and(index.search("world").or("werld"))
# Negation of union
index.search("hello").and.not(index.search("world").or("werld"))
# Union inside phrase
index.search("hello").and(index.search("world").or("werld"))
All terms support a few options that can be applied.
Prefix terms: match all terms starting with a prefix.
(Akin to like term%
in SQL)
index.search("hel", prefix: true)
index.search("hello worl", prefix: true)
index.search("hel", prefix: true).and("worl", prefix: true)
index.search("hello").and.not("worl", prefix: true)
Optional terms: documents containing the optional terms will rank higher than those without
index.search("foo").and("bar", optional: true).and("baz", optional: true)
Fuzzy terms: matches are performed based on Levenshtein distance (LD). The maximum Levenshtein distance supported is 3.
index.search("zuchini", fuzziness: 1)
Search terms can also be scoped to specific fields using a where
clause:
# Simple field specific query
index.search.where(name: "john")
# Using where with options
index.search.where(first: "jon", fuzziness: 1)
# Using where with more complex query
index.search.where(first: index.search("bill").or("bob"))
Searching for numeric fields takes a range:
index.search.where(number: 0..100)
# Searching to infinity
index.search.where(number: 0..Float::INFINITY)
index.search.where(number: -Float::INFINITY..0)
slop(level)
in_order
slop
. We make sure the query terms appear
in the same order in the Document
as in the query, regardless of the
offsets between them.no_content
Document
ids and not the content. This is useful if
RediSearch is being used on a Rails model where the Document
attributes
don't matter and it's being converted into ActiveRecord
objects.language(language)
Document
s in Chinese, this should be set to chinese in order to
properly tokenize the query terms. If an unsupported language is sent, the
command returns an error.sort_by(field, order: :asc)
:asc
or :desc
limit(num, offset = 0)
num
at the offset
. The default limit
is set to 10
.count
Document
s found in the search queryhighlight(fields: [], opening_tag: "<b>", closing_tag: "</b>")
fields
are an
array of fields to be highlighted.verbatim
no_stop_words
with_scores
Document
. This can be used to
merge results from multiple instances. This will add a score
method to the
returned Document
instances.return(*fields)
Document
are returned.explain
Spellchecking is initiated off a RediSearch::Index
instance and provides
suggestions for misspelled search terms. It takes an optional distance
argument which is the maximal Levenshtein distance for spelling suggestions. It
returns an array where each element contains suggestions for each search term
and a normalized score based on its occurrences in the index.
main ❯ index = RediSearch::Index.new("user_idx") { text_field :name, phonetic: "dm:en" }
main ❯ index.spellcheck("jimy")
RediSearch (1.1ms) FT.SPELLCHECK user_idx jimy DISTANCE 1
=> [#<RediSearch::Spellcheck::Result:0x00007f805591c670
term: "jimy",
suggestions:
[#<struct RediSearch::Spellcheck::Suggestion score=0.0006849315068493151, suggestion="jimmy">,
#<struct RediSearch::Spellcheck::Suggestion score=0.00019569471624266145, suggestion="jim">]>]
main ❯ index.spellcheck("jimy", distance: 2).first.suggestions
RediSearch (0.5ms) FT.SPELLCHECK user_idx jimy DISTANCE 2
=> [#<struct RediSearch::Spellcheck::Suggestion score=0.0006849315068493151, suggestion="jimmy">,
#<struct RediSearch::Spellcheck::Suggestion score=0.00019569471624266145, suggestion="jim">]
Integration with Rails is super easy! Call redi_search
with the schema
keyword argument from inside your model. Ex:
class User < ApplicationRecord
redi_search do
text_field :first, phonetic: "dm:en"
text_field :last, phonetic: "dm:en"
end
end
This will automatically add User.search
and User.spellcheck
methods which behave the same as if you called them on an Index
instance.
User.reindex(recreate: false, only: [])
is also added and behaves
similarly to RediSearch::Index#reindex
. Some of the differences include:
Document
s do not need to be passed as the first parameter. The search_import
scope is automatically called and all the records are converted
to Document
s.only
parameter where you can specify a limited number
of fields to update. Useful if you alter the schema and only need to index a
particular field.While defining the schema you can optionally pass it a block. If no block is
passed the name
will called on the model to get the value. If a block is
passed the value for the field is obtained through calling the block.
class User < ApplicationRecord
redi_search do
text_field :name do
"#{first_name} #{last_name}"
end
end
end
You can override the search_import
scope on the model to eager load
relationships when indexing or it can be used to limit the records to index.
class User < ApplicationRecord
scope :search_import, -> { includes(:posts) }
end
When searching, by default a collection of Document
s is returned. Calling
#results
on the search query will execute the search, and then look up all the
found records in the database and return an ActiveRecord relation.
The default Index
name for model Index
s is
#{model_name.plural}_#{RediSearch.env}
. The redi_search
method takes an
optional index_prefix
argument which gets prepended to the index name:
class User < ApplicationRecord
redi_search index_prefix: 'prefix' do
text_field :first, phonetic: "dm:en"
text_field :last, phonetic: "dm:en"
end
end
User.search_index.name
# => prefix_users_development
When integrating RediSearch into a model, records will automatically be indexed
after creating and updating and will be removed from the Index
upon
destruction.
There are a few more convenience methods that are publicly available:
search_document
RediSearch::Document
instanceremove_from_index
Index
add_to_index
Index
search_index
RediSearch::Index
instanceAfter checking out the repo, run bin/setup
to install dependencies. Then, run
rake test
to run the both unit and integration tests. To run them individually
you can run rake test:unit
or rake test:integration
. You can also run
bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To
release a new version, execute bin/publish (major|minor|patch)
which will
update the version number in version.rb
, create a git tag for the version,
push git commits and tags, and push the .gem
file to
rubygems.org and GitHub.
Bug reports and pull requests are welcome on GitHub. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
The gem is available as open source under the terms of the MIT License.
Everyone interacting in the RediSearch project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.