kroomsa
1.0.0
A search engine for the curious. It is a search algorithm designed to engage users by exposing them to relevant yet interesting content during their session.
The search algorithm implemented in your website greatly influences visitor engagement. A decent implementation can significantly reduce dependency on standard search engines like Google for every query thus, increasing engagement. Traditional methods look at terms or phrases in your query to find relevant content based on syntactic matching. Kroomsa uses semantic matching to find content relevant to your query. There is a blog post expanding upon Kroomsa's motivation and its technical aspects.
python3 ./setup.py
in the root directory./vectorizer
directory./config
under bot_codes
parameter in the following format: "client_id client_secret user_agent"
as list elements separated by ,
.python3 -m pip install -r ./preprocess_requirements.txt
python3 ./pre_processing/scraping/questions/scrape_questions.py
. It launches a script that scrapes the subreddits sequentially till their inception and stores the submissions as JSON objects in /pre_processing/scraping/questions/scraped_questions
. It then partitions the scraped submissions into as many equal parts as there are registered instances of bots.bot_codes
, we can begin scraping the comments using the partitioned submission files created while scraping submissions. Using the following command: python3 ./pre_processing/scraping/comments/scrape_comments.py
multiple processes are spawned that fetch comment streams simultaneously.python3 ./pre_processing/db_insertion/insertion.py
. It inserts the posts and associated comments in mongo.python3 ./post_processing/post_processing.py
. Apart from cleaning, it also adds emojis to each submission object (This behavior is configurable).python3 ./index/build_index.py
. By default, it creates an exhaustive IDMap, Flat
index but is configurable through the /config
./mongo_dump
. Use the following command at the root dir to create a database dump. mongodump --db database_name(default: red) --collection collection_name(default: questions) -o ./mongo_dump
.python3 -m pip install -r ./inference_requirements.txt
gunicorn -c ./gunicorn_config.py server:app
demo_mode
to True
in /config
.docker-compose build
docker-compose up
This project is licensed under the Apache License Version 2.0