Creating knowledge graphs from unstructured data
This application is designed to turn Unstructured data (pdfs,docs,txt,youtube video,web pages,etc.) into a knowledge graph stored in Neo4j. It utilizes the power of Large language models (OpenAI,Gemini,etc.) to extract nodes, relationships and their properties from the text and create a structured knowledge graph using Langchain framework.
Upload your files from local machine, GCS or S3 bucket or from web sources, choose your LLM model and generate knowledge graph.
By default only OpenAI and Diffbot are enabled since Gemini requires extra GCP configurations. Accoroding to enviornment we are configuring the models which is indicated by VITE_LLM_MODELS_PROD variable we can configure model based on our need. EX:
VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both):
OPENAI_API_KEY="your-openai-key"
DIFFBOT_API_KEY="your-diffbot-key"
if you only want OpenAI:
VITE_LLM_MODELS_PROD="diffbot,openai-gpt-3.5,openai-gpt-4o"
OPENAI_API_KEY="your-openai-key"
if you only want Diffbot:
VITE_LLM_MODELS_PROD="diffbot"
DIFFBOT_API_KEY="your-diffbot-key"
You can then run Docker Compose to build and start all components:
docker-compose up --build
By default, the input sources will be: Local files, Youtube, Wikipedia ,AWS S3 and Webpages. As this default config is applied:
VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,web"
If however you want the Google GCS integration, add gcs
and your Google client ID:
VITE_REACT_APP_SOURCES="local,youtube,wiki,s3,gcs,web"
VITE_GOOGLE_CLIENT_ID="xxxx"
You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove any you don't want/need.
By default,all of the chat modes will be available: vector, graph_vector, graph, fulltext, graph_vector_fulltext , entity_vector and global_vector. If none of the mode is mentioned in the chat modes variable all modes will be available:
VITE_CHAT_MODES=""
If however you want to specify the only vector mode or only graph mode you can do that by specifying the mode in the env:
VITE_CHAT_MODES="vector,graph"
Alternatively, you can run the backend and frontend separately:
cd frontend
yarn
yarn run dev
cd backend
python -m venv envName
source envName/bin/activate
pip install -r requirements.txt
uvicorn score:app --reload
To deploy the app and packages on Google Cloud Platform, run the following command on google cloud run:
# Frontend deploy
gcloud run deploy dev-frontend --set-env-vars "VITE_BACKEND_API_URL=" --set-env-vars "VITE_FRONTEND_HOSTNAME=hostname.us-central1.run.app" --set-env-vars "VITE_SEGMENT_API_URL=https://cdn.segment.com/v1/projects/4SGwdwzuDm5WkFvQtz7D6ATQlo14yjmW/settings"
source location current directory > Frontend
region : 32 [us-central 1]
Allow unauthenticated request : Yes
# Backend deploy
gcloud run deploy --set-env-vars "OPENAI_API_KEY = " --set-env-vars "DIFFBOT_API_KEY = " --set-env-vars "NEO4J_URI = " --set-env-vars "NEO4J_PASSWORD = " --set-env-vars "NEO4J_USERNAME = "
source location current directory > Backend
region : 32 [us-central 1]
Allow unauthenticated request : Yes
Env Variable Name | Mandatory/Optional | Default Value | Description |
---|---|---|---|
EMBEDDING_MODEL | Optional | all-MiniLM-L6-v2 | Model for generating the text embedding (all-MiniLM-L6-v2 , openai , vertexai) |
IS_EMBEDDING | Optional | true | Flag to enable text embedding |
KNN_MIN_SCORE | Optional | 0.94 | Minimum score for KNN algorithm |
GEMINI_ENABLED | Optional | False | Flag to enable Gemini |
GCP_LOG_METRICS_ENABLED | Optional | False | Flag to enable Google Cloud logs |
NUMBER_OF_CHUNKS_TO_COMBINE | Optional | 5 | Number of chunks to combine when processing embeddings |
UPDATE_GRAPH_CHUNKS_PROCESSED | Optional | 20 | Number of chunks processed before updating progress |
NEO4J_URI | Optional | neo4j://database:7687 | URI for Neo4j database |
NEO4J_USERNAME | Optional | neo4j | Username for Neo4j database |
NEO4J_PASSWORD | Optional | password | Password for Neo4j database |
LANGCHAIN_API_KEY | Optional | API key for Langchain | |
LANGCHAIN_PROJECT | Optional | Project for Langchain | |
LANGCHAIN_TRACING_V2 | Optional | true | Flag to enable Langchain tracing |
LANGCHAIN_ENDPOINT | Optional | https://api.smith.langchain.com | Endpoint for Langchain API |
VITE_BACKEND_API_URL | Optional | http://localhost:8000 | URL for backend API |
VITE_BLOOM_URL | Optional | https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true | URL for Bloom visualization |
VITE_REACT_APP_SOURCES | Mandatory | local,youtube,wiki,s3 | List of input sources that will be available |
VITE_CHAT_MODES | Mandatory | vector,graph+vector,graph,hybrid | Chat modes available for Q&A |
VITE_ENV | Mandatory | DEV or PROD | Environment variable for the app |
VITE_TIME_PER_PAGE | Optional | 50 | Time per page for processing |
VITE_CHUNK_SIZE | Optional | 5242880 | Size of each chunk of file for upload |
VITE_GOOGLE_CLIENT_ID | Optional | Client ID for Google authentication | |
VITE_LLM_MODELS_PROD | Optional | openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash | To Distinguish models based on the Enviornment PROD or DEV |
GCS_FILE_CACHE | Optional | False | If set to True, will save the files to process into GCS. If set to False, will save the files locally |
ENTITY_EMBEDDING | Optional | False | If set to True, It will add embeddings for each entity in database |
LLM_MODEL_CONFIG_ollama_ |
Optional | Set ollama config as - model_name,model_local_url for local deployments | |
RAGAS_EMBEDDING_MODEL | Optional | openai | embedding model used by ragas evaluation framework |
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama3
LLM_MODEL_CONFIG_ollama_
#example
LLM_MODEL_CONFIG_ollama_llama3=${LLM_MODEL_CONFIG_ollama_llama3-llama3,
http://host.docker.internal:11434}
VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl}
LLM Knowledge Graph Builder Application
Neo4j Workspace
Demo of application
For any inquiries or support, feel free to raise Github Issue