QiZhenGPT Download - QiZhenGPT Source code download

QiZhenGPT

Other source code

1.0.0

Download

Qizhen Medical Model & MedCopilot

QiZhenGPT

QiZhenGPT: An Open Source Chinese Medical Large Language Model

This project uses the Chinese medical instruction data set constructed by Qizhen Medical Knowledge Base, and based on this, the instructions are fine-tuned on the Chinese-LLaMA-Plus-7B, CaMA-13B, and ChatGLM-6B models, which greatly improves the model's performance in Chinese medical scenarios. To achieve the desired results, we first released an evaluation data set for drug knowledge Q&A, with subsequent plans to optimize the Q&A effects on diseases, surgeries, tests, etc., and expand on applications such as doctor-patient Q&A and automatic generation of medical records.

MedCopilot

MedCopilot is a smart medical assistant developed based on启真医学大模型, Qizhen's medical knowledge base, and医疗临床数据. It aims to provide comprehensive smart support for patients, doctors, and hospital management. By integrating advanced artificial intelligence technology, rich medical knowledge and clinical data, MedCopilot will become a new productive force in the medical industry.

MedCopilot has been officially launched in the Second Affiliated Hospital of Zhejiang University.

Update record

renew

[2024/08/09] Update MedCopilot related information;

[2023/06/27] The open source QiZhen medical large model trial version (QiZhen-CaMA-13B-Checkpoint-12400) aims to improve the accuracy of knowledge Q&A on diseases and drugs in the medical field;

[2023/06/09] The open source QiZhen medical large model trial version (QiZhen-CaMA-13B-Checkpoint-6000) is designed to improve the accuracy of drug knowledge question and answer in the medical field;

[2023/06/02] The open source QiZhen medical large model trial version (QiZhen-CaMA-13B-Checkpoint-3600) is designed to improve the accuracy of drug knowledge Q&A in the medical field;

[2023/05/30] Open source 20k training data (this data set comes from real doctor-patient knowledge question and answer data collected and compiled by Qizhen Medical Knowledge Base and based on the drug text knowledge of Qizhen Medical Knowledge Base, through semi-structured Data sets the instruction data constructed by the specific question template);

[2023/05/30] The open source QiZhen medical large model trial version (QiZhen-ChatGLM-6B- Checkpoint-2500) is designed to improve the accuracy of drug knowledge questions and answers in the medical field;

[2023/05/25] Open source drug indication evaluation data set;

[2023/05/24] The open source QiZhen medical large model trial version (QiZhen-Chinese-LLaMA-7B- Checkpoint-6000) is designed to improve the accuracy of drug knowledge question and answer in the medical field;

[2023/05/23] The open source QiZhen medical large model trial version (QiZhen-Chinese-LLaMA-7B- Checkpoint-3500) is designed to improve the accuracy of drug knowledge Q&A in the medical field;

MedCopilot feature details

1. Function list assistant

MedCopilot is deeply integrated with the HIS system and the electronic medical record system. It uses the Qizhen Medical large model to analyze data from each system and automatically summarize the important work items of the doctor that day.

Statistics and analysis of admitted patients;
Statistics and analysis of surgical conditions;
Statistics and analysis of consultation situations;
Statistics and analysis of document writing conditions;
Statistics and analysis of key patients;

2. Auxiliary diagnosis and treatment assistant

MedCopilot combines Qizhen's medical knowledge base and patient clinical data to provide personalized diagnosis and treatment recommendations to help doctors make more accurate medical decisions.

3. Medical Quality Assistant

In accordance with the national medical quality policy, MedCopilot monitors medical process data in real time, detects and corrects potential problems in a timely manner, and improves the overall medical quality.

4. Medical records assistant

MedCopilot comprehensively analyzes patient diagnosis and treatment data and automatically generates medical record documents that comply with regulations, reducing doctors' repetitive work and improving work efficiency.

5. Other functions

Research Assistant: Interpretation of Papers
Health assistant: report interpretation, chronic disease management

QizhenGPT details

Instruction data set construction

At present, most open source ChatLLM projects use instruction data generated by other models (such as ChatGPT), which inevitably has the problem of data illusion. The data illusion problem will seriously affect the application and expansion of LLM in actual scenarios. Therefore, in order to improve the accuracy of knowledge question and answer in the medical field, this project uses the following method to construct the instruction data set:

Qizhen Medical Knowledge Base collects real doctor-patient knowledge question and answer data (diseases, drugs, examinations, surgeries, prognosis, food, etc.), with a total of 560K instruction data;
Drug knowledge data: Based on the drug text knowledge of Qizhen Medical Knowledge Base, an instruction data set is constructed by setting specific question templates for semi-structured data (such as: "What are the applicable diseases of {drug}?"), totaling 180K instruction data;
Disease knowledge data: Based on the disease text knowledge of Qizhen Medical Knowledge Base, an instruction data set is constructed by setting specific question templates for semi-structured data (such as: "What are the typical symptoms of {disease}?"), totaling 298K instruction data;

training details

QiZhen-Chinese-LLaMA-7B- Checkpoint-3500: This project is based on Chinese-LLaMA-Plus-7B for instruction fine-tuning. The project is trained on 7 A800 (80G). This open source is the LoRA weight during the training process. The 3500 steps (training 23h50min);
QiZhen-Chinese-LLaMA-7B- Checkpoint-6000: This project is based on Chinese-LLaMA-Plus-7B for instruction fine-tuning. The project is trained on 7 A800 (80G). This open source is the LoRA weight during the training process. The 6000 steps (training 40h56min);
QiZhen-ChatGLM-6B- Checkpoint-2500: This project is based on ChatGLM-6B for instruction fine-tuning. The project is trained on 7 A800 (80G). This open source is the LoRA weight for the 2500th 2500 steps in the training process (training 16h20min);
QiZhen-CaMA-13B-Checkpoint-3600: This project is based on CaMA-13B for instruction fine-tuning. The project is trained on 7 A800 (80G). This open source is the LoRA weight for the 3600 steps in the training process (training 37h37min).
QiZhen-CaMA-13B-Checkpoint-6000: This project is based on CaMA-13B for instruction fine-tuning. The project is trained on 7 A800 (80G). This open source is the LoRA weight for the 6000 steps in the training process (training 54h30min).
QiZhen-CaMA-13B-Checkpoint-12400: This project is based on CaMA-13B for instruction fine-tuning. The project is trained on 6 A800 (80G). This open source is the LoRA weight for the 12400 steps in the training process (training 114h46min).

Model download

Model	instruction data set	Base Model	LoRA download
QiZhen-Chinese-LLaMA-7B- Checkpoint-3500	740K	Chinese-LLaMA-Plus-7B	Baidu Skydisk
QiZhen-Chinese-LLaMA-7B- Checkpoint-6000	740K	Chinese-LLaMA-Plus-7B	Baidu Skydisk
QiZhen-ChatGLM-6B- Checkpoint-2500	740K	ChatGLM-6B	Baidu Skydisk
QiZhen-CaMA-13B-Checkpoint-3600	740K	CaMA	Baidu Skydisk
QiZhen-CaMA-13B-Checkpoint-6000	740K	CaMA	Baidu Skydisk
QiZhen-CaMA-13B-Checkpoint-12400	1038K	CaMA	Baidu Skydisk

A Quick Start

QiZhen-Chinese-LLaMA-7B

Environmental installation;

pip install -r requirements.txt

Get Chinese-LLaMA-Plus-7B, see here for details;
Download LoRA, download the model and place it in the lora directory;
Execute scripts/merge_llama_plus.sh script;

 sh scripts/merge_llama_plus.sh

Modify the model position parameters in gradio_chinese-llama_demo.py ;
Start demo;

 python gradio_chinese-llama_demo.py

QiZhen-ChatGLM-6B

Environmental installation;

pip install -r requirements.txt

Get ChatGLM-6B, see here for details;
Download LoRA, download the model and place it in the lora directory;
Modify the model position parameters in gradio_chatglm_demo.py ;
Start demo;

 python gradio_chatglm_demo.py

QiZhen-CaMA-13B-Checkpoint-3600

Environmental installation;

pip install -r requirements.txt

Get CaMA, details here;
Download LoRA, download the model and place it in the lora directory;
Modify the model position parameters in gradio_cama_demo.py ;
Start demo;

 python gradio_cama_demo.py

Pre-research

Qizhen Medical Large Model adheres to the technical route of "data + knowledge two-wheel drive". Through the close integration of large model technology and medical knowledge base, it integrates the actual needs in medical application scenarios, from medical data management, medical reasoning, doctor-patient Q&A Iterative research and development will be carried out in areas such as automatic generation of medical records and clinical decision-making support systems to facilitate the implementation and application of large model technology in the medical industry. The picture below is a case of "Qizhen Medical Knowledge Base assists large model to generate answers". This technology was released after testing.

Comparison of model effects

Test 1

input: What diseases can febuxostat treat?
ChatGLM: Febuxostat is a new drug used to treat asthma and is a β2-receptor agonist. It has been approved for the treatment of the following diseases: 1. Asthma: Febuxostat is a β2-receptor agonist that can stimulate β2-receptors, thereby relieving asthma symptoms, including dyspnea, shortness of breath, and chest tightness. 2. COPD: Febuxostat can also be used to relieve symptoms caused by COPD, including shortness of breath, shortness of breath, and cough. 3. Allergic rhinitis: Febuxostat can relieve symptoms caused by allergic rhinitis, including nasal congestion, runny nose, and sneezing. 4. Other diseases: Febuxostat is also used to treat other diseases, such as carotenemia and skin scratching. It should be noted that febuxostat is a drug whose therapeutic effect and safety vary from person to person. It is best to use it under the guidance of a doctor.
ChatGPT: Fubuxitam is a drug that my current knowledge is as of 2021 and does not include the latest medical research and drug developments. From what I've read, febuxostat is a platelet aggregation inhibitor commonly used to prevent and treat thrombotic disorders. It is used to prevent heart disease, cerebrovascular disease (such as stroke), venous thromboembolism (such as deep vein thrombosis and pulmonary embolism), and other conditions related to blood clots.

However, since I am unable to provide the most up-to-date medical information, I recommend that you consult a medical professional or pharmacist to obtain the most up-to-date information about febuxostat and to understand exactly its indications and effectiveness in treating specific medical conditions.
QiZhen-Chinese-LLaMA-7B- Checkpoint-3500: This product is used for the long-term treatment of hyperuricemia in patients with gout.
This product is a prescription drug. It must be prescribed by a doctor according to the condition and used according to the doctor's instructions, including usage, dosage, medication time, etc. Do not use medication by yourself according to the instructions on the medication.
QiZhen-CaMA-13B-Checkpoint-3600: This product is used for the long-term treatment of hyperuricemia in patients with gout.

Test 2

Input: Indications of Fluphenazine Decanoate Injection
ChatGLM: Fluphenazine Decanoate Injection is a drug with the following indications: 1. Used to treat neuropsychiatric symptoms such as depression, anxiety, and insomnia; 2. Used to treat epilepsy, brain trauma, and cerebral arteriosclerosis and other neurological diseases; 3. Used to relieve chronic pain, such as dysmenorrhea, arthritis and other painful diseases; 4. Used to improve learning ability and memory ability, and can be used for the intellectual development and improvement of learning effects of children and adolescents; 5. Used For preventing HIV infection and treating AIDS. Fluphenazine Decanoate Injection is an oral medication commonly used to treat depression, anxiety, and insomnia. A treatment plan needs to be formulated based on the doctor's advice. When using Fluphenazine Decanoate Injection, you should follow your doctor's advice and the medication instructions and dosage in the drug instructions.
ChatGPT: Flumazenil Decanoate Injection (Flumazenil Injection) is a drug used to treat sedative or hypnotic states caused by overdose or overuse of benzodiazepines (such as diazepam drugs). It acts as a benzodiazepine receptor antagonist and can reverse the central nervous system depressant effects of benzodiazepines. <Fluphenazine Decanoate Injection is mainly used for the following indications:

1. Overdose of benzodiazepines: When patients experience symptoms such as coma, drowsiness, and respiratory depression due to overdose or overuse of diazepam drugs, fluphenazine decanoate injection can be used to reverse these symptoms.

2. Reversal during anesthesia: Fluphenazine Decanoate Injection can be used to reverse the sedative effects of benzodiazepines after surgery or other anesthetic procedures and help patients return to a conscious state.

3. Diagnostic use: Fluphenazine Decanoate Injection can also be used to assist in the diagnosis of benzodiazepine poisoning or overdose.

Please note that fluphenazine decanoate injection should be used rationally and monitored by a professional physician based on specific circumstances. Before using this medication, carefully read and follow the drug instructions and follow your doctor's instructions and recommendations.
QiZhen-Chinese-LLaMA-7B- Checkpoint-3500: This product is used for schizophrenia, mania, etc. It can also be used for myotonic pain.
QiZhen-CaMA-13B-Checkpoint-3600: This product is used for various manifestations of schizophrenia.

Test 3

input: What are the main functions of Tibetan Green Fruit Tea?
ChatGLM: Tibetan green fruit tea is a natural drink, mainly made of green fruits, tea leaves, honey and other materials. Its main functions include: 1. Improve immunity: Tibetan green fruit tea is rich in nutrients, such as vitamin C, vitamin E, etc., which can enhance the body's immunity, improve the body's resistance, and prevent the occurrence of diseases. 2. Promote digestion: Tibetan green fruit tea is rich in fiber, which can promote gastrointestinal motility, help digestion, and prevent constipation. 3. Antioxidant effect: The green fruit extract in Tibetan green fruit tea has a strong antioxidant effect, which can remove free radicals in the body, delay aging, and prevent diseases. 4. Reduce blood lipids: The tea extract in Tibetan green fruit tea has a certain effect in lowering blood lipids and can prevent and treat hyperlipidemia. 5. Improve sleep: The honey in Tibetan green fruit tea has a certain sedative effect, which can help relieve stress, improve sleep quality, and improve the body's self-healing ability.
ChatGPT: Tibetan green fruit tea is a tea drink made with Tibetan green fruit as the main raw material. As far as I know, it has the following main functions:

1. Antioxidant effect: Tibetan green fruit is rich in a variety of natural antioxidants, such as polyphenols and vitamin C, which can help remove free radicals in the body, slow down cell aging, and enhance immunity.

2. Promote digestion: Tibetan green fruit tea has a good digestive stimulating effect, helps promote gastrointestinal motility, improves digestive function, and relieves stomach discomfort and indigestion.

3. Provide nutrition: Tibetan green fruit is rich in vitamin C, vitamin E, vitamin B complex, minerals and other nutrients. Drinking Tibetan green fruit tea can provide the body with the nutrients it needs.

4. Weight loss: Tibetan green fruit tea is considered to have a weight loss effect. It can promote metabolism, accelerate fat decomposition and burning, and help reduce weight and fat accumulation.

5. Anti-inflammatory and analgesic: Tibetan green fruit tea contains some ingredients with anti-inflammatory and analgesic effects, which can relieve discomforts such as joint pain, muscle soreness, and headaches.

6. Improve sleep: Tibetan green fruit tea helps relax the nervous system, has a calming and soothing effect, can improve sleep quality and relieve insomnia problems.

It should be noted that the above effects are mainly based on the traditional uses of Tibetan Green Fruit Tea and certain research results. The specific effects vary from person to person and may be affected by individual differences and usage methods. If you have a specific health condition or need to take medication, it is recommended to seek the advice of a physician or health care professional.
QiZhen-Chinese-LLaMA-7B- Checkpoint-3500: clear away heat, detoxify, produce body fluids and quench thirst. It can relieve symptoms such as fever, thirst, and sore throat.
QiZhen-CaMA-13B-Checkpoint-3600: Hidden blue fruit tea has the effects of clearing heat, soothing the throat, reducing swelling and relieving pain.
Tibetan green fruit tea can be used to relieve symptoms such as redness and pain in the throat, dry and burning throat, hoarseness and other symptoms.

Test 4

input: What are the clinical manifestations of Reiter syndrome?
ChatGLM: Reiter syndrome (Reiter syndrome type I) is a congenital heart murmur whose clinical manifestations include:
1. Heart murmur: The characteristic heart murmur of Wright syndrome type I is a right-to-left shunt murmur caused by a patent foramen ovale, which is usually characterized by variability in murmur intensity, location, and frequency.
2. Pulmonary hypertension: People with Reiter syndrome type I often have pulmonary hypertension, which can lead to respiratory failure and heart failure.
3. Growth and Development Abnormalities: Most people with Reiter syndrome type I will show growth and development abnormalities such as short stature, malnutrition, and gigantism in childhood and adolescence.
4. Structural abnormalities of the heart: Although murmurs are the primary clinical manifestation of Reiter syndrome type I, some patients may also have other structural abnormalities of the heart, such as heart valve disease and arrhythmias.
5. Other symptoms: People with Reiter syndrome type I may also have other symptoms, such as fatigue, difficulty breathing, chest pain, cough, bloating, and constipation. These symptoms may be related to their heart murmur and pulmonary hypertension, but they may also be related to other factors.
  It should be noted that Reiter syndrome type I is a rare disease, and its clinical manifestations may vary depending on individual differences. If you suspect that you have Reiter syndrome type I, please consult your doctor promptly for diagnosis and treatment.
ChatGPT: Reiter syndrome refers to a disease caused by achalasia of the lower esophageal sphincter and weakened esophageal motility caused by gastroesophageal reflux disease (GERD). Its main clinical manifestations include:
1. Dysphagia: The patient feels stuck when swallowing and needs to swallow several times to swallow the food.
2. Chest pain: People with Reiter's syndrome often experience chest pain, especially after eating or lying down.
3. Belching and acid reflux: Patients often experience belching and acid reflux, especially after eating or lying down.
4. Hoarseness: The patient's voice may become hoarse due to irritation of the mouth and throat areas by reflux material.
5. Sore throat: reflux material irritates the throat mucosa.
  It is important to note that the symptoms of Reiter syndrome are similar to those of other diseases, so the diagnosis needs to be confirmed through relevant examinations and diagnostic methods.
QiZhen-CaMA-13B-Checkpoint-12400: The main clinical manifestations of Reiter syndrome are acute or chronic arthritis, urethritis, conjunctivitis, and skin and mucosal lesions.
Reiter's syndrome is a chronic non-specific inflammatory disease of unknown etiology. It is a type of seronegative spondyloarthritis. It is characterized by recurrent, migrating, asymmetric polyarthritis, which may be accompanied by urethral inflammatory disease. inflammation, conjunctivitis, skin, mucous membrane and eye lesions.
The main clinical manifestations of Reiter syndrome are acute or chronic arthritis, urethritis, conjunctivitis, and skin and mucosal lesions.
1. Arthritis: The most common, any joint can be affected, and it is more common in the small joints of the knees, ankles, and feet. Local redness, swelling, heat, pain, and dysfunction may occur.
2. Urethritis: manifested by frequent urination, urgency, dysuria, redness and swelling of the urethral orifice, increased urethral secretions, etc.
3. Conjunctivitis: manifests as unilateral or bilateral conjunctival inflammation, which may include conjunctival congestion, edema, eyelid edema, increased secretions, etc.
4. Skin and mucosal lesions: manifested as seborrheic erythema, genital erythema, pustules, skin pustules, and skin ulcers.

Experimental evaluation

Drug indication evaluation

Evaluation criteria: Randomly select 94 types of drug data, follow the "Indications for {drug}" composition instructions, let ChatGPT (gpt3.5), ChatGLM, and QiZhe answer respectively, and then ask professional medical personnel to模型的答案Compare and score with该药品的药品说明书. The following are three scoring criteria:

Standard 1: If the model answer hits one indication, the answer is correct;
Standard 2: If the number of indications hit by the model answer is greater than or equal to 1/2 of the number of indications in the drug package insert, the answer is correct;
Standard 3: If the number of indications hit by the model answer is greater than or equal to 2/3 of the number of indications in the drug package insert, the answer is correct;

Model	Standard 1	Standard 2	Standard 3
ChatGLM	39.36%	23.16%	14.74%
ChatGPT	47.87%	30.85%	15.96%
QiZhen-Chinese-LLaMA-7B-Checkpoint-3500	77.66%	55.32%	40.00%
QiZhen-Chinese-LLaMA-7B-Checkpoint-6000	90.43%	73.40%	65.96%
QiZhen-CaMA-13B-Checkpoint-3600	82.29%	60.62%	47.92%
QiZhen-CaMA-13B-Checkpoint-6000	90.43%	80.85%	72.34%
QiZhen-CaMA-13B-Checkpoint-12400	91.49%	82.98%	72.34%

Remark:

If QiZhen-Chinese-LLaMA-7B-Checkpoint-6000: There is a "repetition phenomenon" in the reply (we are continuing to fix this problem), please increase repetition_penalty parameter;
QiZhen-ChatGLM-6B-Checkpoint-2500 was not evaluated because during the experiment we found that ChatGLM could not well meet the requirements of medical knowledge fact question and answer during the instruction fine-tuning process: when its response was required to be more accurate, the model "rereads" "The phenomenon is quite serious; when solving the question of "rereading", the facts of the answers are very poor (the data illusion is serious);
QiZhen-CaMA-13B-Checkpoint-3600: There is basically no "re-reading" phenomenon in the reply content of this version;
QiZhen-CaMA-13B-Checkpoint-6000: Compared with QiZhen-CaMA-13B-Checkpoint-3600 , we have increased the parameter amount of LoRA, and the effect has been effectively improved;
More detailed evaluation details and data will be released later.

disease assessment

Evaluation criteria: Randomly select 100 disease data, and form "therapeutic drugs" according to "Which drugs can treat {disease}?", "What examinations are needed for {disease}?", "What are the clinical manifestations of {disease}?", "Examination" and "clinical manifestations" refer to Order, let ChatGPT (gpt3.5), ChatGLM, and QiZhen0 answer respectively, and then ask professional medical personnel to compare and score模型的答案with启真医学知识库疾病知识. The following are the three scoring standards:

Standard 1: If the model answer hits a "therapeutic drug" ("examination test", "clinical manifestation"), the answer is correct;
Standard 2: If the number of "therapeutic drugs" ("examination tests", "clinical manifestations") hit by the model answer is greater than or equal to 1/2 of the number of indications in the drug instructions, the answer is correct;
Standard 3: If the number of "therapeutic drugs" ("inspection tests", "clinical manifestations") hit by the model answer is greater than or equal to 2/3 of the number of indications in the drug instructions, the answer is correct;

Model	Clinical Performance Standard 1	Clinical Performance Criteria 2	Clinical Performance Criteria 3	Check inspection standards 1	Check Inspection Standard 2	Check Inspection Standard 3	Therapeutic drug standards 1	Therapeutic Drug Standards 2	Therapeutic Drug Standards 3
chatglm	90.00%	6.00%	3.00%	93.00%	11.00%	6.00%	60.00%	10.00%	5.00%
chatgpt	94.00%	11.00%	4.00%	97.00%	8.00%	5.00%	62.00%	11.00%	4.00%
QiZhen-CaMA-13B-Checkpoint-12400	95.00%	15.00%	7.00%	97.00%	20.00%	7.00%	75.00%	36.00%	23.00%

Acknowledgments

In addition, this project is based on the secondary development of the following open source projects. I would like to express my gratitude to the relevant projects and research and development personnel.

LLAMA
Standford Alpaca
CaMA
Chinese LLaMA & Alpaca large model

License and Disclaimer

License

See LICENSE for details

Disclaimer

The resources related to this project are for academic research only and are strictly prohibited for commercial use. When using parts involving third-party code, please strictly follow the corresponding open source agreement. The content generated by the model is affected by factors such as model calculation, randomness and loss of quantification accuracy, and this project does not guarantee its accuracy. This project does not assume any legal responsibility for any content output by the model, nor is it responsible for any losses that may arise from the use of related resources and output results.