A GenAI Assistant based on Langchain + Streamlit + Azure Cosmos DB for MongoDB (vCore) + Docker.
Authors:
Tested on Linux Ubuntu 20.04 (may need tweaks for other systems).
Min hardware requirements solely for the AI Assistant App deployment and Vector database creation (excl. the PanKB DB, ETL and AI Assistant app):
System requirements:
The DB population process can take up to 90-150 minutes. It depends on the DEV server and Cosmos DB sharded cluster configurations. The MongoDB storage size the populated collection is ~ 1.0 GiB, incl. the indexes.
Please note the following limitations and considerations:
Create the .env file in the following format:
## Do not put this file under version control!
OPENAI_API_KEY=<insert the API key here without quotes>
COHERE_API_KEY=<insert the API key here without quotes>
TOGETHER_API_KEY=<insert the API key here without quotes>
GOOGLE_API_KEY=<insert the API key here without quotes>
ANTHROPIC_API_KEY=<insert the API key here without quotes>
REPLICATE_API_TOKEN=<insert the API key here without quotes>
VOYAGE_API_KEY=<insert the API key here without quotes>
## MongoDB-PROD (Azure Cosmos DB for MongoDB) Connection String
# Had to multiply maxIdleTimeMS by 10 to handle
# urllib3.exceptions.ProtocolError:
# ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
MONGODB_CONN_STRING = "<insert the connection string here with quotes>"
The DB population script does not have to be executed in a docker container. It can be done with the following commands:
# install all the requirements and dependencies
pip3 install -r requirements.txt
# Run the script with two command line arguments:
# the name of the folder containing the documents to feed to the LLM
# and
# the name of the MongoDB collection that will contain the vector DB
python3 make_vectordb.py ./Paper_all pankb_vector_store
The command for building the docker image and recreating the docker container with the Streamlit app inside:
docker compose up -d --build --force-recreate
The dockerized streamlit app does not have to be executed in tmux. It will always be up and running even after the VM is rebooted (achieved by using the option restart: always
in the docker compose file).
The status of the docker container can be checked with the following command:
docker ps
The command should produce approx. the following output among others in case of successful deployment:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
54d89d7c4fad pankb_llm:latest "streamlit run strea…" 10 minutes ago Up 10 minutes 0.0.0.0:8501->8501/tcp, :::8501->8501/tcp pankb-llm