llama3 playground
1.0.0
一个完整的、可立即运行的环境,用于使用自定义数据集微调 Llama 3 模型并在微调后的模型上运行推理
注意:目前仅在 NVIDIA RTX 2080 和 NVIDIA Tesla T4 GPU 上进行了测试。它尚未在其他 GPU 类别或 CPU 上进行过测试。
在您的主机上运行此命令以检查您安装了哪个 Nvidia GPU。
nvidia-smi
这应该会显示你的 GPU 信息。
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
| -----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| =========================================+======================+====================== |
| 0 NVIDIA GeForce RTX 2080 Off | 00000000:01:00.0 On | N/A |
| 22% 38C P8 17W / 215W | 197MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
git clone https://github.com/amithkoujalgi/llama3-playground.git
cd llama3-playground
bash build.sh
bash run.sh
这将使用以下服务启动 Docker 容器。
服务 | 外部可访问端点 | 内部端口 | 描述 |
---|---|---|---|
导师 | http://本地主机:8884 | 9001 | 用于在自定义数据集上运行训练并查看训练器进程的日志 |
FastAPI服务器 | http://localhost:8883/docs | 8070 | 用于访问模型服务器的API |
JupyterLab服务器 | http://localhost:8888/lab | 8888 | 访问 JupyterLab 界面以浏览容器并更新/试验代码 |
注意:所有进程(OCR、训练和推理)都使用 GPU,如果同时运行任何类型的多个进程,我们将遇到内存不足 (OOM) 问题。为了解决这个问题,系统被设计为在任何给定时间点仅运行一个进程。 (即,一次只能运行一个 OCR 或训练或推理实例)
请随意根据您的需要更新代码。
转到终端并输入
playground --train
转到终端并输入
playground -l
这会在/app/data/trained-models/
下生成模型。训练器脚本生成 2 个模型:
lora-adapters
后缀的模型。运行 OCR:
cd /app/llama3_playground/core
python ocr.py
-f " /app/sample.pdf "
要了解选项的含义,请转到 JupyterLab 并执行python ocr.py -h
使用 RAG 进行推理:
cd /app/llama3_playground/core
python infer_rag.py
-m " llama-3-8b-instruct-custom-1720802202 "
-d " /app/data/ocr-runs/123/text-result.txt "
-q " What is the employer name, address, telephone, TIN, tax year end, type of business, plan name, Plan Sequence Number, Trust ID, Account number, is it a new plan or existing plan as true or false, are elective deferrals and roth deferrals allowed as true or false, are loans permitted as true or false, are life insurance investments permitted and what is the ligibility Service Requirement selected? "
-t 256
-e " Alibaba-NLP/gte-base-en-v1.5 "
-p " There are checkboxes in the text that denote the value as selected if the text is [Yes], and unselected if the text is [No]. The checkbox option's value can either be before the selected value or after. Keep this in context while responding and be very careful and precise in picking these values. Always respond as JSON. Keep the responses precise and concise. "
要了解选项的含义,请转到 JupyterLab 并执行python infer_rag.py -h
如果您的主机上没有安装 NVIDIA Container Toolkit,则需要执行此操作。
# Configure the production repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |
sed ' s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g ' |
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Optionally, configure the repository to use experimental packages
sed -i -e ' /experimental/ s/^#//g ' /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Update the packages list from the repository
sudo apt-get update
# Install the NVIDIA Container Toolkit packages
sudo apt-get install -y nvidia-container-toolkit
其他环境请参考此。
curl --silent -X ' POST '
' http://localhost:8883/api/infer/sync/ctx-text '
-H ' accept: application/json '
-H ' Content-Type: application/json '
-d ' {
"model_name": "llama-3-8b-instruct-custom-1720690384",
"context_data": "You are a magician who goes by the name Magica",
"question_text": "Who are you?",
"prompt_text": "Respond in a musical and Shakespearean tone",
"max_new_tokens": 50
} ' | jq -r " .data.response "
curl -X ' POST '
' http://localhost:8883/api/ocr/sync/pdf '
-H ' accept: application/json '
-H ' Content-Type: multipart/form-data '
-F ' file=@your_file.pdf;type=application/pdf '
true
,否则返回false
。 curl -X ' GET '
' http://localhost:8883/api/ocr/status '
-H ' accept: application/json '
参考: