Documentation
The purpose of this package is to offer a convenient question-answering (RAG) system with a simple YAML-based configuration that enables interaction with multiple collections of local documents. Special attention is given to improvements in various components of the system in addition to basic LLM-based RAGs - better document parsing, hybrid search, HyDE enabled search, chat history, deep linking, re-ranking, the ability to customize embeddings, and more. The package is designed to work with custom Large Language Models (LLMs) – whether from OpenAI or installed locally.
Supported formats
.md
- Divides files based on logical components such as headings, subheadings, and code blocks. Supports additional features like cleaning image links, adding custom metadata, and more..pdf
- MuPDF-based parser..docx
- custom parser, supports nested tables.Unstructured
pre-processor:
Support for table parsing via open-source gmft (https://github.com/conjuncts/gmft) or Azure Document Intelligence.
Optional support for image parsing using Gemini API.
Supports multiple collection of documents, and filtering the results by a collection.
An ability to update the embeddings incrementally, without a need to re-index the entire document base.
Generates dense embeddings from a folder of documents and stores them in a vector database (ChromaDB).
multilingual-e5-base
.instructor-large
.Generates sparse embeddings using SPLADE (https://github.com/naver/splade) to enable hybrid search (sparse + dense).
Supports the "Retrieve and Re-rank" strategy for semantic search, see here.
ms-marco-MiniLM
cross-encoder, more modern bge-reranker
is supported.Supports HyDE (Hypothetical Document Embeddings) - see here.
Support for multi-querying, inspired by RAG Fusion
- https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1
Supprts optional chat history with question contextualization
Allows interaction with embedded documents, internally supporting the following models and methods (including locally hosted):
Interoperability with LiteLLM + Ollama via OpenAI API, supporting hundreds of different models (see Model configuration for LiteLLM)
Other features
Browse Documentation