Follow these steps to set up and run the project:
Install PostgreSQL
admin
.Configure the Project
config
folder in the project directory.db.js
and update line 3:
mayanksharma
to your system username.Set Up the Database
CREATE EXTENSION vector;
Install Ollama
ollama pull snowflake-arctic-embed
Install Project Dependencies
npm install
node server.js
Install REST Client Extension
Test the API
api.http
file to test the API endpoints.{
"query": "your_search_query"
}
{
"title": "magazine_title",
"author": "author_name",
"category": "magazine_category",
"content": "magazine_content"
}
I have used PostgreSQL with pgvector (storing embedding vectors) and tsvector (storing content text).
Requirement: search from 1 million records
Added Hierarchical Navigable Small Worlds (HNSW) indexes for vector search on content embeddings Reason: Search requires high recall, which makes hnsw better than ivfflat Reference
Added indexes for title, author and content
Pagination added for reduce load times
Profile: Peak
Virtual Users: 20
Test Duration: 5 minutes
Endpoint hit: POST /api/v1/magazine/hybridsearch/1 ("glasgow", "game", "business", "shubham", "food" and "modern")
Total requests sent: 10,915
Request per second: 35.62
Avg response time: 116 ms
Two individual services for text search and vector search is used
Embeddings are generated by Meta llama "snowflake-arctic-embed" model, being lightweight.
STEP 1: Common objects from both vector and full text search results are shown first,
STEP 2: followed by objects from only text search,
STEP 3: rest of the objects from vector search.
query: vector "glasgow", return "Celtic feast journal" which has "Scotland written in content"
query: vector "shortbread", returns "Celtic feast journal" as "shortbread" is related to "scotland"
query: keyword/full-text "shubham", returns "Physics Refresher" which has author name "Shubham Thorve"
query: keyword/full-text "mayank", returns "Digit Gaming" which has author name "Mayank Khurana"
query: keyword/full-text "month", returns "Dalal Street Journal" which has content "All about video games this month"
/model