Official Website • Documentation • Discord
NEW: Subscribe to our mailing list for updates and news!
Indox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox provides users with a powerful toolset to efficiently extract relevant data.
Indox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox provides users with a powerful toolset to efficiently extract relevant data. One of its key features is the ability to intelligently cluster primary chunks to form more robust groupings, enhancing the quality and relevance of the extracted information. With a focus on adaptability and user-centric design, Indox aims to deliver future-ready functionality with more features planned for upcoming releases. Join us in exploring how Indox can revolutionize your document processing workflow, bringing clarity and organization to your data retrieval needs.
? Model Support | Implemented | Description |
---|---|---|
Ollama (e.g. Llama3) | ✅ | Local Embedding and LLM Models powered by Ollama |
HuggingFace | ✅ | Local Embedding and LLM Models powered by HuggingFace |
Mistral | ✅ | Embedding and LLM Models by Cohere |
Google (e.g. Gemini) | ✅ | Embedding and Generation Models by Google |
OpenAI (e.g. GPT4) | ✅ | Embedding and Generation Models by OpenAI |
Supported Model Via Indox Api | Implemented | Description |
---|---|---|
OpenAi | ✅ | Embedding and LLm OpenAi Model From Indox Api |
Mistral | ✅ | Embedding and LLm Mistral Model From Indox Api |
Anthropic | Embedding and LLm Anthropic Model From Indox Api |
? Loader and Splitter | Implemented | Description |
---|---|---|
Simple PDF | ✅ | Import PDF |
UnstructuredIO | ✅ | Import Data through Unstructured |
Clustered Load And Split | ✅ | Load pdf and texts. add a extra clustering layer |
RAG Features | Implemented | Description |
---|---|---|
Hybrid Search | Semantic Search combined with Keyword Search | |
Semantic Caching | ✅ | Results saved and retrieved based on semantic meaning |
Clustered Prompt | ✅ | Retrieve smaller chunks and do clustering and summarization |
Agentic Rag | ✅ | Generate more reliabale answer, rank context and web search if needed |
Advanced Querying | Task Delegation Based on LLM Evaluation | |
Reranking | ✅ | Rerank results based on context for improved results |
Customizable Metadata | Free control over Metadata |
? Cool Bonus | Implemented | Description |
---|---|---|
Docker Support | Indox is deployable via Docker | |
Customizable Frontend | Indox's frontend is fully-customizable via the frontend |
☑️ Examples | Run in Colab |
---|---|
Indox Api (OpenAi) | |
Mistral (Using Unstructured) | |
OpenAi (Using Clustered Split) | |
HuggingFace Models(Mistral) | |
Ollama | |
Evaluate with IndoxJudge |
The following command will install the latest stable inDox
pip install Indox
To install the latest development version, you may run
pip install git+https://github.com/osllmai/inDox@master
Clone the repository and navigate to the directory:
git clone https://github.com/osllmai/inDox.git
cd inDox
Install the required Python packages:
pip install -r requirements.txt
If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named indox
:
python -m venv indox
indoxScriptsactivate
python3 -m venv indox
source indox/bin/activate
Once the virtual environment is activated, install the required dependencies by running:
pip install -r requirements.txt
pip install indox
pip install openai
pip install chromadb
If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named indox
:
python -m venv indox
indox_judgeScriptsactivate
python3 -m venv indox
2. **Activate the virtual environment:**
```bash
source indox/bin/activate
Once the virtual environment is activated, install the required dependencies by running:
pip install -r requirements.txt
To start, you need to load your API keys from the environment.
import os
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
Import the necessary classes from the Indox package.
from indox import IndoxRetrievalAugmentation
from indox.llms import OpenAi
from indox.embeddings import OpenAiEmbedding
Create an instance of IndoxRetrievalAugmentation.
Indox = IndoxRetrievalAugmentation()
openai_qa = OpenAiQA(api_key=OPENAI_API_KEY,model="gpt-3.5-turbo-0125")
openai_embeddings = OpenAiEmbedding(model="text-embedding-3-small",openai_api_key=OPENAI_API_KEY)
file_path = "sample.txt"
In this section, we take advantage of the unstructured
library to load
documents and split them into chunks by title. This method helps in
organizing the document into manageable sections for further
processing.
from indox.data_loader_splitter import UnstructuredLoadAndSplit
loader_splitter = UnstructuredLoadAndSplit(file_path=file_path)
docs = loader_splitter.load_and_chunk()
Starting processing...
End Chunking process.
Storing document chunks in a vector store is crucial for enabling efficient retrieval and search operations. By converting text data into vector representations and storing them in a vector store, you can perform rapid similarity searches and other vector-based operations.
from indox.vector_stores import ChromaVectorStore
db = ChromaVectorStore(collection_name="sample",embedding=embed_openai)
Indox.connect_to_vectorstore(db)
Indox.store_in_vectorstore(docs)
2024-05-14 15:33:04,916 - INFO - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2024-05-14 15:33:12,587 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-05-14 15:33:13,574 - INFO - Document added successfully to the vector store.
Connection established successfully.
query = "how cinderella reach her happy ending?"
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa,top_k=5)
retriever.invoke(query)
2024-05-14 15:34:55,380 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-05-14 15:35:01,917 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
'Cinderella reached her happy ending by enduring mistreatment from her step-family, finding solace and help from the hazel tree and the little white bird, attending the royal festival where the prince recognized her as the true bride, and ultimately fitting into the golden shoe that proved her identity. This led to her marrying the prince and living happily ever after.'
retriever.context
["from the hazel-bush. Cinderella thanked him, went to her mother'snngrave and planted the branch on it, and wept so much that the tearsnnfell down on it and watered it. And it grew and became a handsomenntree. Thrice a day cinderella went and sat beneath it, and wept andnnprayed, and a little white bird always came on the tree, and ifnncinderella expressed a wish, the bird threw down to her what shennhad wished for.nnIt happened, however, that the king gave orders for a festival",
'worked till she was weary she had no bed to go to, but had to sleepnnby the hearth in the cinders. And as on that account she alwaysnnlooked dusty and dirty, they called her cinderella.nnIt happened that the father was once going to the fair, and hennasked his two step-daughters what he should bring back for them.nnBeautiful dresses, said one, pearls and jewels, said the second.nnAnd you, cinderella, said he, what will you have. Father',
'face he recognized the beautiful maiden who had danced withnnhim and cried, that is the true bride. The step-mother andnnthe two sisters were horrified and became pale with rage, he,nnhowever, took cinderella on his horse and rode away with her. Asnnthey passed by the hazel-tree, the two white doves cried -nnturn and peep, turn and peep,nnno blood is in the shoe,nnthe shoe is not too small for her,nnthe true bride rides with you,nnand when they had cried that, the two came flying down and',
"to send her up to him, but the mother answered, oh, no, she isnnmuch too dirty, she cannot show herself. But he absolutelynninsisted on it, and cinderella had to be called. She firstnnwashed her hands and face clean, and then went and bowed downnnbefore the king's son, who gave her the golden shoe. Then shennseated herself on a stool, drew her foot out of the heavynnwooden shoe, and put it into the slipper, which fitted like annglove. And when she rose up and the king's son looked at her",
'slippers embroidered with silk and silver. She put on the dressnnwith all speed, and went to the wedding. Her step-sisters and thennstep-mother however did not know her, and thought she must be annforeign princess, for she looked so beautiful in the golden dress.nnThey never once thought of cinderella, and believed that she wasnnsitting at home in the dirt, picking lentils out of the ashes. Thennprince approached her, took her by the hand and danced with her.']
.----------------. .-----------------. .----------------. .----------------. .----------------.
| .--------------. || .--------------. || .--------------. || .--------------. || .--------------. |
| | _____ | || | ____ _____ | || | ________ | || | ____ | || | ____ ____ | |
| | |_ _| | || ||_ |_ _| | || | |_ ___ `. | || | .' `. | || | |_ _||_ _| | |
| | | | | || | | | | | || | | | `. | || | / .--. | || | / / | |
| | | | | || | | | | | | || | | | | | | || | | | | | | || | > `' < | |
| | _| |_ | || | _| |_ |_ | || | _| |___.' / | || | `--' / | || | _/ /'` _ | |
| | |_____| | || ||_____|____| | || | |________.' | || | `.____.' | || | |____||____| | |
| | | || | | || | | || | | || | | |
| '--------------' || '--------------' || '--------------' || '--------------' || '--------------' |
'----------------' '----------------' '----------------' '----------------' '----------------'