A fully-contained, ready-to-run environment to finetune Llama 3 model with custom dataset and run inference on the fine-tuned models
Note: This is tested only on NVIDIA RTX 2080 and NVIDIA Tesla T4 GPUs so far. It hasn't been tested with the other GPU classes or on CPUs.
Run this command on your host machine to check which Nvidia GPU you've installed.
nvidia-smi
That should display your GPU info.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Off | 00000000:01:00.0 On | N/A |
| 22% 38C P8 17W / 215W | 197MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
git clone https://github.com/amithkoujalgi/llama3-playground.git
cd llama3-playground
bash build.sh
bash run.sh
This starts the Docker container with the following services.
Service | Externally accessible endpoint | Internal Port | Description |
---|---|---|---|
Supervisor | http://localhost:8884 | 9001 | For running training on custom dataset and viewing logs of trainer process |
FastAPI Server | http://localhost:8883/docs | 8070 | For accessing APIs of the model server |
JupyterLab Server | http://localhost:8888/lab | 8888 | Access JupyterLab interface for browsing the container and updating/experimenting with the code |
Note: All the processes (OCR, training and inference) use GPU and if more than one process of any type would be run simultaneously, we would encounter out-of-memory (OOM) issues. To handle that, the system has been designed to run only one process at any given point in time. (i.e., Only one instance of OCR or training or inference can be run at a time)
Feel free to update the code according to your needs.
Go to terminal and type
playground --train
Go to terminal and type
playground -l
This produces models under /app/data/trained-models/
. The trainer script produces 2 models:
lora-adapters
.Run OCR:
cd /app/llama3_playground/core
python ocr.py
-f "/app/sample.pdf"
For understanding what the options mean, go to JupyterLab and execute python ocr.py -h
Inference with RAG:
cd /app/llama3_playground/core
python infer_rag.py
-m "llama-3-8b-instruct-custom-1720802202"
-d "/app/data/ocr-runs/123/text-result.txt"
-q "What is the employer name, address, telephone, TIN, tax year end, type of business, plan name, Plan Sequence Number, Trust ID, Account number, is it a new plan or existing plan as true or false, are elective deferrals and roth deferrals allowed as true or false, are loans permitted as true or false, are life insurance investments permitted and what is the ligibility Service Requirement selected?"
-t 256
-e "Alibaba-NLP/gte-base-en-v1.5"
-p "There are checkboxes in the text that denote the value as selected if the text is [Yes], and unselected if the text is [No]. The checkbox option's value can either be before the selected value or after. Keep this in context while responding and be very careful and precise in picking these values. Always respond as JSON. Keep the responses precise and concise."
For understanding what the options mean, go to JupyterLab and execute python infer_rag.py -h
This would be needed if you do not have NVIDIA Container Toolkit installed on your host machine.
# Configure the production repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Optionally, configure the repository to use experimental packages
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Update the packages list from the repository
sudo apt-get update
# Install the NVIDIA Container Toolkit packages
sudo apt-get install -y nvidia-container-toolkit
For other environments, refer to this.
curl --silent -X 'POST'
'http://localhost:8883/api/infer/sync/ctx-text'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"model_name": "llama-3-8b-instruct-custom-1720690384",
"context_data": "You are a magician who goes by the name Magica",
"question_text": "Who are you?",
"prompt_text": "Respond in a musical and Shakespearean tone",
"max_new_tokens": 50
}' | jq -r ".data.response"
curl -X 'POST'
'http://localhost:8883/api/ocr/sync/pdf'
-H 'accept: application/json'
-H 'Content-Type: multipart/form-data'
-F 'file=@your_file.pdf;type=application/pdf'
true
if any OCR process is running, false
otherwise.curl -X 'GET'
'http://localhost:8883/api/ocr/status'
-H 'accept: application/json'
References: