A simple search engine implementation.
Study purposes, mostly for understanding the implementation details of how search engines are made, performance trade-offs and structure.
The idea is to make a isomorphic application from the UI to the database system.
The application can be deployed using either docker or the binary released in github.
./searchzin -c <path-to-config>.yml
After that you can look into http://localhost:8080
to see the search page.
The configuration can be made by either the configuration file located by
default in /etc/searchzin/config.yml
, or providing configuration keys in the
form -C key=value
, the second form overrides the first.
Configuration defaults:
port: 8080 # Service port
path:
log: /var/log/searchzin # Log directory
data: /var/lib/searchzin # Data directory
All the project structure is made in golang, using the
gin
framework.
Dependencies are managed using dep
.
Most of the project toolchain is managed by the
Makefile
,
the important targets are:
install
: Install needed dependencies and git hooksreadme
: Performs README.md
inclusion of fileslint
: Performs linting and formatting of the codetest
: Well, compile and run unit testsbuild
: Creates a linux
distributable folder in dist
run
: Runs the code using go run
run-dev
: Creates and runs a docker containerrelease
: Creates a release docker imagepublish
: Publishes the docker image on dockerhub using the contents of the
VERSION
file as the versionpublish-latest
: Publishes the docker image on dockerhub with the latest
tagwatch
: Performs lint
and test
on file modificationfunc-test
: Performs functional tests inside the features
folderThere are 6 main components to this search engine:
Each component has a clear responsability in the system, and all of them work togheter to respond to queries and document indexing requests.
It's responsible to store and give id's to newly created documents. The constraints are:
id
sid
generation with no collisions for persistencessd
or hdd
Stores a reverse-index of "terms" and documents
terms
to document set relationskey
manipulation strategies for queries with keyword approximationGiven a new document understands it and saves both on the index database and the document database.
Parses the user input and transforms it into a query plan using a tree-like data structure.
lucene
'sGiven a query tree, optimizes it being aware of the restrictions and the environment in which it will be executed.
After having a structured plan the query then retrieves effective data from the
index
database, this step is performed by the executor.
This query language is heavily based on lucene's, to simplify design and understand what tradeoffs were made.
The current test scenario that will be used will be indexing podcasts by name, content and tags.
The base usage can be found in searchzin-example
.
searchzin is available under the MIT license.