php rag下载 - php rag源代码下载

php rag

Ai源码

v1.1.0

下载

使用 PHP 创建检索增强生成应用程序

该应用程序使用通过 OpenAI API 访问的 LLM（大型语言模型）GPT-4o，以便根据用户输入生成文本。用户输入用于从数据库检索相关信息，然后使用检索到的信息生成文本。这种方法结合了 Transformer 的强大功能和对源文档的访问。

在此特定应用程序中，将搜索 1000 多个网站的数据库以查找与特定人员相关的信息。这里真正的挑战是，搜索到的人“Michał Żarnecki”在 2 个不同的上下文中显示为 2 个同名的不同人。目标不仅是找到特定信息，还要了解上下文并避免错误，例如混淆两个同名的不同人的信息。

我在medium.com上的文章中描述了此应用程序中使用的概念，并提供了更多详细信息https://medium.com/@michalzarnecki88/a-guide-to-using-llm-retrieval-augmented- Generation-with-php-3bff25ce6616

对于设置，您需要首先安装 Docker 和 Docker Compose https://docs.docker.com/compose/install/

设置：

在 CLI 中运行： cd app/src && composer install
设置语言模型 - 从以下选项中选择：带有 OpenAI API 的选项

“A”，通过本地 ollama API3 提供免费模型

“B”与 OpenAI API

选项 B 更简单，需要较少的 CPU 和 RAM 资源，但需要 OpenAI API 密钥https://platform.openai.com/settings/profile?tab=api-keys选项 A 需要更多的 CPU 和 RAM 资源，但可以运行它在本地使用ollama API。对于此选项，最好有 GPU。

请按照以下首选选项 A 或 B 的说明进行操作：

A. 使用 ollama* 下载 Llama3 模型并在本地运行 LLM（此选项速度较慢，需要更多资源，但在本地环境上完全有效）：
Ollama 作为 docker-compose 的一部分提供，因此您可以直接在第 3 点运行 docker。

如果您想在本地设置 ollama，请使用此文件底部的说明，但如果使用 docker，则不需要它。

*Ollama 提供本地 API 服务法学硕士：“启动并运行大型语言模型。” https://ollama.com/

B. 通过 OpenAI API 运行 GPT-4o（此选项速度更快，但需要 OpenAI API 密钥）：
B.1.在 app/src 中创建 api_key.txt 文件并将您的 OpenAI API 密钥放在那里
B.2.通过取消注释第 9 行并删除第 10 行，在 app/src/loadDocuments.php 的类中使用 Ada002TextEncoder.php

运行 docker-compose：

docker-compose up

*提示：脚本需要首先转换源文档，这甚至可能需要 30 分钟。我想节省一些时间，只需从 app/src/documents 中删除部分文档即可。

等到容器设置完成 - 您应该在控制台日志中看到：

php-app | Loaded documents complete
php-app | Postgres is ready - executing command
php-app | [Sat Nov 02 11:32:28.365214 2024] [core:notice] [pid 1:tid 1] AH00094: Command line: 'apache2 -D FOREGROUND'

在浏览器中打开地址127.0.0.1:2037并提出您的问题

用法：

网页浏览器

docker compose 完成设置容器后，在浏览器中打开地址 127.0.0.1:2037 并提出您的问题

用作 API

您可以通过使用以下请求将应用程序用作 API：

选项A olama：
curl -d '{"prompt":"what is result of 2+2?"}' -H "Content-Type: application/json" -X POST http://127.0.0.1:2037/processOllama.php?api

选项 B OpenAI GPT：
curl -d '{"prompt":"what is result of 2+2?"}' -H "Content-Type: application/json" -X POST http://127.0.0.1:2037/processGpt.php?api

命令行界面

运行 docker Interactive docker exec -it php-app sh
在 CLI 中运行： php minicli rag
提出问题

输出示例：

实施例1

 ##### INPUT:
What is the result of 2 + 2?
##### RESPONSE:
The result of 2 + 2 is 4.

实施例2

 ##### INPUT: 
what is specialization of Michał Żarnecki based on his website content
##### RESPONSE:
Michał Żarnecki is a programmer and lecturer specializing in several key areas related to modern software development and data-driven technologies.
 
His expertise includes:
1. **Programming Languages**: Python, PHP, JavaScript.
2. **AI and Machine Learning**: Designing systems and solutions related to artificial intelligence and machine learning.
3. **Data Mining and Big Data**: Extracting valuable insights from large datasets. 
4. **Natural Language Processing (NLP)**: Working on systems that understand and generate human language. 
5. **Software Development Frameworks**: Utilizing various tools and frameworks such as Streamlit, TensorFlow, PyTorch, and langchain. 
6. **Database Systems**: Implementing and working with databases like PostgreSQL, Elasticsearch, Neo4j, and others.
 
His portfolio highlights projects such as an AI chatbot for analyzing company documents and a self-driving vehicle based on TensorFlow and Raspberry Pi.
Additionally, he has contributed to conferences and created e-learning courses focused on machine learning, underscoring his dual role as a developer and educator.

实施例3

 ##### INPUT: 
Is Michał Żarnecki programmer the same person as Michał Żarnecki audio engineer?

##### RESPONSE:
Based on the information provided, it appears that Michał Żarnecki the programmer and Michał Żarnecki the audio engineer are not the same person.
Here’s why:
1. **Fields of Expertise**:
 - Michał Żarnecki, the audio engineer, was a well-known operator and reżyser dźwięku (sound director) in Poland, with notable contributions to the Polish film industry, as mentioned in the Wikipedia excerpt.
 - Michal Żarnecki, the programmer, has a portfolio focused on programming in Python, PHP, and JavaScript, with projects related to AI, machine learning, data mining, and software development.
2. **Lifespan**:
 - Michał Żarnecki the audio engineer was born on November 12, 1946, and passed away on November 21, 2016.
 - The projects listed in Michał Żarnecki the programmer’s portfolio date from 2014 to 2016, which would be conflicting if he had passed away in 2016 and was actively working in those years. 
3. **Occupational Focus**:
 - The audio engineer has a career documented in film sound engineering and education.
 - The programmer’s career is centered around software development, mobile applications, ERP systems, and consulting in technology.

Given the distinct differences in their professional domains, timelines, and expertise, it is highly unlikely that they are the same individual

概念：

基本概念：

书呆子的更多详细信息：

调试

要加快加载文档或使用更多文档以更好地检索，请在 app/src/service/DocumentLoader.php:20 中操纵 $skipFirstN 值

更改 PHP 脚本后，使用命令重建 docker：
docker-compose rm
docker rmi -f php-rag
docker-compose up

资源：

用于填充矢量数据库的网站来自 Kaggle 上的“网站分类”数据集作者：Hetul Mehta 链接：https://www.kaggle.com/datasets/hetulmehta/website-classification?resource=download

在本地设置 ollama

A.1.从https://ollama.com/download下载 ollama
A.2.下载 Llama 3 8B 与ollama pull llama3:latest
A.3.下载 mxbai 嵌入模型ollama pull mxbai-embed-large
A.4.确保模型已下载且 ollama 正在运行

 ollama list
NAME                    	ID          	SIZE  	MODIFIED       
mxbai-embed-large:latest	468836162de7	669 MB	7 seconds ago 	
llama3:latest           	365c0bd3c000	4.7 GB	17 seconds ago