php rag下載 - php rag原始碼下載

php rag

Ai源碼

v1.1.0

下載

使用 PHP 建立檢索增強生成應用程式

該應用程式使用透過 OpenAI API 存取的 LLM（大型語言模型）GPT-4o，以便根據使用者輸入來產生文字。使用者輸入用於從資料庫檢索相關信息，然後使用檢索到的信息生成文字。這種方法結合了 Transformer 的強大功能和對來源文件的存取。

在此特定應用程式中，將搜尋 1000 多個網站的資料庫以查找與特定人員相關的資訊。這裡真正的挑戰是，搜尋到的人「Michał Żarnecki」在 2 個不同的上下文中顯示為 2 個同名的不同人。目標不僅是找到特定訊息，還要了解上下文並避免錯誤，例如混淆兩個同名的不同人的信息。

我在medium.com上的文章中描述了此應用程式中使用的概念，並提供了更多詳細資訊https://medium.com/@michalzarnecki88/a-guide-to-using-llm-retrieval-augmented - Generation-with-php-3bff25ce6616

對於設置，您需要先安裝 Docker 和 Docker Compose https://docs.docker.com/compose/install/

設定:

在 CLI 中運行： cd app/src && composer install
設定語言模型 - 從以下選項中選擇：帶有 OpenAI API 的選項

“A”，透過本地 ollama API3 提供免費模型

「B」與 OpenAI API

選項 B 較簡單，需要較少的 CPU 和 RAM 資源，但需要 OpenAI API 金鑰https://platform.openai.com/settings/profile?tab=api-keys選項A 需要更多的CPU 和RAM 資源，但可以運行它在本地使用ollama API。對於此選項，最好有 GPU。

請按照以下首選選項 A 或 B 的說明進行操作：

A. 使用 ollama* 下載 Llama3 模型並在本地運行 LLM（此選項速度較慢，需要更多資源，但在本地環境上完全有效）：
Ollama 作為 docker-compose 的一部分提供，因此您可以直接在第 3 點執行 docker。

如果您想在本機設定 ollama，請使用此檔案底部的說明，但如果使用 docker，則不需要它。

*Ollama 提供本地 API 服務法學碩士：“啟動並運行大型語言模型。” https://ollama.com/

B. 透過 OpenAI API 運行 GPT-4o（此選項速度更快，但需要 OpenAI API 金鑰）：
B.1.在 app/src 中建立 api_key.txt 檔案並將您的 OpenAI API 金鑰放在那裡
B.2.透過取消註解第 9 行並刪除第 10 行，在 app/src/loadDocuments.php 的類別中使用 Ada002TextEncoder.php

運行 docker-compose：

docker-compose up

*提示：腳本需要先轉換來源文檔，甚至可能需要 30 分鐘。我想節省一些時間，只需從 app/src/documents 中刪除部分文件即可。

等到容器設定完成 - 您應該在控制台日誌中看到：

php-app | Loaded documents complete
php-app | Postgres is ready - executing command
php-app | [Sat Nov 02 11:32:28.365214 2024] [core:notice] [pid 1:tid 1] AH00094: Command line: 'apache2 -D FOREGROUND'

在瀏覽器中開啟地址127.0.0.1:2037並提出您的問題

用法：

網頁瀏覽器

docker compose 完成設定容器後，在瀏覽器中開啟位址 127.0.0.1:2037 並提出您的問題

用作 API

您可以透過使用以下請求將應用程式用作 API：

選項A olama：
curl -d '{"prompt":"what is result of 2+2?"}' -H "Content-Type: application/json" -X POST http://127.0.0.1:2037/processOllama.php?api

選項 B OpenAI GPT：
curl -d '{"prompt":"what is result of 2+2?"}' -H "Content-Type: application/json" -X POST http://127.0.0.1:2037/processGpt.php?api

命令列介面

運行 docker Interactive docker exec -it php-app sh
在 CLI 中運行： php minicli rag
提出問題

輸出範例：

實施例1

 ##### INPUT:
What is the result of 2 + 2?
##### RESPONSE:
The result of 2 + 2 is 4.

實施例2

 ##### INPUT: 
what is specialization of Michał Żarnecki based on his website content
##### RESPONSE:
Michał Żarnecki is a programmer and lecturer specializing in several key areas related to modern software development and data-driven technologies.
 
His expertise includes:
1. **Programming Languages**: Python, PHP, JavaScript.
2. **AI and Machine Learning**: Designing systems and solutions related to artificial intelligence and machine learning.
3. **Data Mining and Big Data**: Extracting valuable insights from large datasets. 
4. **Natural Language Processing (NLP)**: Working on systems that understand and generate human language. 
5. **Software Development Frameworks**: Utilizing various tools and frameworks such as Streamlit, TensorFlow, PyTorch, and langchain. 
6. **Database Systems**: Implementing and working with databases like PostgreSQL, Elasticsearch, Neo4j, and others.
 
His portfolio highlights projects such as an AI chatbot for analyzing company documents and a self-driving vehicle based on TensorFlow and Raspberry Pi.
Additionally, he has contributed to conferences and created e-learning courses focused on machine learning, underscoring his dual role as a developer and educator.

實施例3

 ##### INPUT: 
Is Michał Żarnecki programmer the same person as Michał Żarnecki audio engineer?

##### RESPONSE:
Based on the information provided, it appears that Michał Żarnecki the programmer and Michał Żarnecki the audio engineer are not the same person.
Here’s why:
1. **Fields of Expertise**:
 - Michał Żarnecki, the audio engineer, was a well-known operator and reżyser dźwięku (sound director) in Poland, with notable contributions to the Polish film industry, as mentioned in the Wikipedia excerpt.
 - Michal Żarnecki, the programmer, has a portfolio focused on programming in Python, PHP, and JavaScript, with projects related to AI, machine learning, data mining, and software development.
2. **Lifespan**:
 - Michał Żarnecki the audio engineer was born on November 12, 1946, and passed away on November 21, 2016.
 - The projects listed in Michał Żarnecki the programmer’s portfolio date from 2014 to 2016, which would be conflicting if he had passed away in 2016 and was actively working in those years. 
3. **Occupational Focus**:
 - The audio engineer has a career documented in film sound engineering and education.
 - The programmer’s career is centered around software development, mobile applications, ERP systems, and consulting in technology.

Given the distinct differences in their professional domains, timelines, and expertise, it is highly unlikely that they are the same individual

概念：

基本概念：

書呆子的更多詳細資訊：

偵錯

若要加快載入文件或使用更多文件以更好地檢索，請在 app/src/service/DocumentLoader.php:20 中操縱 $skipFirstN 值

更改 PHP 腳本後，使用指令重建 docker：
docker-compose rm
docker rmi -f php-rag
docker-compose up

資源：

用於填充向量資料庫的網站來自 Kaggle 上的「網站分類」資料集作者：Hetul Mehta 連結：https://www.kaggle.com/datasets/hetulmehta/website-classification?resource=download

在本地設定 ollama

A.1.從https://ollama.com/download下載 ollama
A.2.下載 Llama 3 8B 與ollama pull llama3:latest
A.3.下載 mxbai 嵌入模型ollama pull mxbai-embed-large
A.4.確保模型已下載且 ollama 正在運行

 ollama list
NAME                    	ID          	SIZE  	MODIFIED       
mxbai-embed-large:latest	468836162de7	669 MB	7 seconds ago 	
llama3:latest           	365c0bd3c000	4.7 GB	17 seconds ago