This project demonstrates various retrieval techniques for document retrieval using Python. The techniques implemented include HyDe, Basic, Reciprocal Rank Fusion (RRF), Fusion Retrieval and Sub Query Decomposition(SQD). The project uses Streamlit for the user interface and various libraries for document processing and retrieval.
Clone the repository:
git clone https://github.com/yourusername/yourrepository.git
cd yourrepository
Install the required dependencies:
pip install -r requirements.txt
Run the Streamlit application:
streamlit run app.py
Upload a PDF file using the sidebar.
Select a retrieval technique from the sidebar.
Enter a query in the text input box and view the retrieved documents.
HyDe (Hypothetical Document) retrieval generates a hypothetical document based on the query and retrieves similar documents.
Basic retrieval uses a simple similarity search to retrieve documents based on the query.
Reciprocal Rank Fusion (RRF) combines the results of multiple retrieval algorithms to improve the overall retrieval performance.
Fusion retrieval combines vector search and BM25 search results using a weighted sum to retrieve the most relevant documents.
Sub Query Decomposition (SQD) is a technique that decomposes the query into sub-queries and retrieves documents based on the sub-queries.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE
file for more details.