Document Question Answering GenAI Download - Document Question Answering GenAI Source code download

Document Question Answering GenAI

AI Source Code

1.0.0

Download

Document Question Answering using LLMs

This project builds an document question answering app powered by Large Language Models (LLMs) like Falcon-7B and Dolly-v2-3B using LangChain, the ChromaDB vector database. It is deployed on Streamlit.

Link to app: https://document-question-answering-kedarghule.streamlit.app/

Note: Due to memory issues with Streamlit, the app may not work sometimes and give an error. This is due to the 1GB memory limit by Streamlit. Here is a video that shows how the app works: https://drive.google.com/file/d/1nkvdqdx1eMWTZqhkyzU_2IJZgOg-uS8O/view?usp=sharing

Document.Question.Answeing.by.Kedar.Ghule.-.Made.with.Clipchamp.1.1.mp4

Problem Statement

In today's era of information overload, individuals and organizations are faced with the challenge of efficiently extracting relevant information from vast amounts of textual data. Traditional search engines often fall short in providing precise and context-aware answers to specific questions posed by users. As a result, there is a growing need for advanced natural language processing (NLP) techniques to enable accurate document question answering (DQA) systems.

The goal of this project is to develop a Document Question Answering app powered by Large Language Models (LLMs), such as Falcon-7B and Dolly-v2-3B, utilizing the LangChain platform and the ChromaDB vector database. By leveraging the capabilities of LLMs, this app aims to provide users with accurate and comprehensive answers to their questions within a given document corpus.

Methodology

Document Loading: The app supports uploading of .txt files and .docx files. Once uploaded, the .docx file is converted to a .txt file. Using LangChain, the document is loaded using TextLoader.
Text Splitting: Next, we split the text recursively by character. For this, we use the RecursiveCharacterTextSplitter. This text splitter is the recommended one for generic text. A chunk size is specified as well as a list of separators.
Generating Embeddings: The HuggingFaceEmbeddings() class in LangChain uses the sentence_transformers embedding models, more specifically the mpnet-base-v2 model.
Vector Database: A vector store using ChromaDB is used to store embedded data and to perform vector search operations.
Context Search: Depending on the user's question, a similarity search is carried out on the vector database to get the most appropriate context to answer the question.
Prompt Engineering: The above context, along with an appropiate prompt and the user's question is created.
Inference using LLMs: Depending on the user's choice, we either use Falcon-7B or Dolly-v2-3B for our inference. We pass the engineered prompt to the model ad display the model's response.

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2025-02-07
size 31.78KB
From Github

Related Applications

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
Open source C# ASP.NET document management portal-DocFlow Document Portal

2009-05-25

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
node telegram bot api

AI Source Code

v0.50.0
typebot.io

AI Source Code

v3.1.2
python wechaty getting started

AI Source Code

1.0.0
waymo open dataset

Other source code

December 2023 Update
wp functions

Other categories

1.0.0
termwind

Other categories

v2.3.0

Related Information All