php ragダウンロード - php ragソースコードのダウンロード

php rag

AI ソースコード

v1.1.0

ダウンロード

PHP で検索拡張生成アプリケーションを作成する

このアプリケーションは、ユーザー入力に基づいてテキストを生成するために、OpenAI API 経由でアクセスされる LLM (Large Language Model) GPT-4o を使用します。ユーザー入力はデータベースから関連情報を取得するために使用され、取得された情報はテキストの生成に使用されます。このアプローチは、トランスフォーマーの能力とソースドキュメントへのアクセスを組み合わせたものです。

この特定のアプリケーションでは、1000 を超える Web サイトのデータベースが特定の人物に関連する情報を検索します。ここでの本当の課題は、検索された人物「Michał Żarnecki」が 2 つの異なるコンテキストで同じ名前の 2 人の異なる人物として表示されることです。目標は、特定の情報を見つけるだけでなく、コンテキストを理解し、同じ名前の 2 人の異なる人物に関する情報を混同するなどの間違いを避けることです。

このアプリケーションで使用される概念については、medium.com の記事 https://medium.com/@michalzarnecki88/a-guide-to-using-llm-retrieval-augmented-generation-with-php-3bff25ce6616 で詳しく説明しました。

セットアップするには、まず Docker と Docker Compose https://docs.docker.com/compose/install/ をインストールする必要があります。

設定：

CLI で実行: cd app/src && composer install
言語モデルのセットアップ - 以下のオプションから選択します:OpenAI API のオプション

ローカル ollam API3 経由の無料モデルの「A」

OpenAI API を使用した「B」

オプション B はよりシンプルで、必要なリソース CPU と RAM は少なくなりますが、OpenAI API キーが必要ですhttps://platform.openai.com/settings/profile?tab=api-keysオプション A はより多くのリソース CPU と RAM を必要としますが、実行できますollam API を使用してローカルで実行します。このオプションには GPU があると便利です。

以下の優先オプション A または B の手順に従ってください。

A. ollama* を使用して Llama3 モデルをダウンロードし、LLM をローカルで実行します (このオプションはより多くのリソースを必要とするため遅くなりますが、ローカル環境で完全に動作します)。
Ollama は docker-compose の一部として提供されているため、ポイント 3 で直接 docker を実行できます。

ollam をローカルでセットアップする場合は、このファイルの下部にある手順に従ってください。ただし、docker を使用する場合は必要ありません。

*Ollama は、LLM を提供するローカル API を提供します。「大規模な言語モデルを立ち上げて実行します。」 https://ollama.com/

B. OpenAI API 経由で GPT-4o を実行します (このオプションは高速ですが、OpenAI API キーが必要です)。
B.1. app/src 内に api_key.txt ファイルを作成し、そこに OpenAI API キーを置きます
B.2. app/src/loadDocuments.php のクラスで Ada002TextEncoder.php を使用するには、9 行目のコメントを解除し、10 行目を削除します。

docker-compose を実行します。

docker-compose up

*ヒント: スクリプトは最初にソースドキュメントを変換する必要がありますが、これには 30 分かかる場合もあります。時間を節約したい場合は、app/src/documents からドキュメントの一部を削除するだけです。

コンテナーのセットアップが完了するまで待ちます。コンソールログに次の内容が表示されます。

php-app | Loaded documents complete
php-app | Postgres is ready - executing command
php-app | [Sat Nov 02 11:32:28.365214 2024] [core:notice] [pid 1:tid 1] AH00094: Command line: 'apache2 -D FOREGROUND'

ブラウザでアドレス 127.0.0.1:2037 を開いて質問してください

使用法：

ウェブブラウザ

Docker Compose がコンテナーのセットアップを完了したら、ブラウザーでアドレス 127.0.0.1:2037 を開いて質問します。

APIとして利用する

以下のようなリクエストを使用することで、アプリケーションをAPIとして利用できます。

オプション A オラマ:
curl -d '{"prompt":"what is result of 2+2?"}' -H "Content-Type: application/json" -X POST http://127.0.0.1:2037/processOllama.php?api

オプション B OpenAI GPT:
curl -d '{"prompt":"what is result of 2+2?"}' -H "Content-Type: application/json" -X POST http://127.0.0.1:2037/processGpt.php?api

CLI

docker interactive docker exec -it php-app shを実行します。
CLI で実行: php minicli rag
質問する

出力例:

例1

 ##### INPUT:
What is the result of 2 + 2?
##### RESPONSE:
The result of 2 + 2 is 4.

例 2

 ##### INPUT: 
what is specialization of Michał Żarnecki based on his website content
##### RESPONSE:
Michał Żarnecki is a programmer and lecturer specializing in several key areas related to modern software development and data-driven technologies.
 
His expertise includes:
1. **Programming Languages**: Python, PHP, JavaScript.
2. **AI and Machine Learning**: Designing systems and solutions related to artificial intelligence and machine learning.
3. **Data Mining and Big Data**: Extracting valuable insights from large datasets. 
4. **Natural Language Processing (NLP)**: Working on systems that understand and generate human language. 
5. **Software Development Frameworks**: Utilizing various tools and frameworks such as Streamlit, TensorFlow, PyTorch, and langchain. 
6. **Database Systems**: Implementing and working with databases like PostgreSQL, Elasticsearch, Neo4j, and others.
 
His portfolio highlights projects such as an AI chatbot for analyzing company documents and a self-driving vehicle based on TensorFlow and Raspberry Pi.
Additionally, he has contributed to conferences and created e-learning courses focused on machine learning, underscoring his dual role as a developer and educator.

例 3

 ##### INPUT: 
Is Michał Żarnecki programmer the same person as Michał Żarnecki audio engineer?

##### RESPONSE:
Based on the information provided, it appears that Michał Żarnecki the programmer and Michał Żarnecki the audio engineer are not the same person.
Here’s why:
1. **Fields of Expertise**:
 - Michał Żarnecki, the audio engineer, was a well-known operator and reżyser dźwięku (sound director) in Poland, with notable contributions to the Polish film industry, as mentioned in the Wikipedia excerpt.
 - Michal Żarnecki, the programmer, has a portfolio focused on programming in Python, PHP, and JavaScript, with projects related to AI, machine learning, data mining, and software development.
2. **Lifespan**:
 - Michał Żarnecki the audio engineer was born on November 12, 1946, and passed away on November 21, 2016.
 - The projects listed in Michał Żarnecki the programmer’s portfolio date from 2014 to 2016, which would be conflicting if he had passed away in 2016 and was actively working in those years. 
3. **Occupational Focus**:
 - The audio engineer has a career documented in film sound engineering and education.
 - The programmer’s career is centered around software development, mobile applications, ERP systems, and consulting in technology.

Given the distinct differences in their professional domains, timelines, and expertise, it is highly unlikely that they are the same individual

コンセプト：

基本コンセプト：

オタク向けの詳細:

デバッグ

ドキュメントの読み込みを高速化するか、より多くのドキュメントを使用して取得を改善するには、app/src/service/DocumentLoader.php:20 の $skipFirstN 値を操作します。

PHP スクリプトを変更した後、次のコマンドを使用して Docker を再構築します。
docker-compose rm
docker rmi -f php-rag
docker-compose up

リソース：

ベクターデータベースを埋めるために使用されるウェブサイトは、Kaggle の「ウェブサイト分類」データセットから取得されています。著者: Hetul Mehta リンク: https://www.kaggle.com/datasets/hetulmehta/website-classification?resource=download

ollamをローカルでセットアップする

A.1. https://ollama.com/downloadから ollam をダウンロードします。
A.2. Llama 3 8B をollama pull llama3:latest
A.3. mxbai 埋め込みモデルollama pull mxbai-embed-largeをダウンロード
A.4.モデルがダウンロードされ、ollam が実行されていることを確認してください

 ollama list
NAME                    	ID          	SIZE  	MODIFIED       
mxbai-embed-large:latest	468836162de7	669 MB	7 seconds ago 	
llama3:latest           	365c0bd3c000	4.7 GB	17 seconds ago