? Using the interesting Chinese corpus qingyun, a Chinese chat robot written by @Doragd: snowman:
Even though she's not that perfect :muscle:, not that great :paw_prints:
But she was coded by myself:sparkling_heart:, so
I hope everyone can give more stars to support this NLP beginner? and his friends? Xiaozhi
This project is actually a sub-module of the software engineering course design. Our goal is to develop an intelligent customer service ticket processing system.
The actual workflow of the intelligent customer service work order system is: when a person asks a question to the system, the system first searches the knowledge base to see if there is a relevant question. If so, it returns the answer to the question. If the person is not satisfied at this time, he or she can directly Submit a work order. If it does not exist in the knowledge base, this chatbot is called for automatic reply.
The service scenario of this system is similar to Tencent Cloud's customer service system. Customers mostly come to consult related issues (cloud servers, domain names, etc.), so the knowledge base is also a collection of consultations and troubleshooting (questions, answers) about cloud servers, domain names, etc. .
The system's front-end interface and front-end and back-end message interaction are completed by another classmate @adjlyadv, mainly using React+Django.
@Doragd is responsible for acquiring the knowledge base and writing, training, and testing the chatbot. The content of this repo is also about this.
│ .gitignore
│ config.py #模型配置参数
│ corpus.pth #已经过处理的数据集
│ dataload.py #dataloader
│ datapreprocess.py #数据预处理
│ LICENSE
│ main.py
│ model.py
│ README.md
│ requirements.txt
│ train_eval.py #训练和验证,测试
│
├─checkpoints
│ chatbot_0509_1437 #已经训练好的模型
│
├─clean_chat_corpus
│ qingyun.tsv #语料库
│
├─QA_data
│ QA.db #知识库
│ QA_test.py #使用知识库时调用
│ stop_words.txt #停用词
│ __init__.py
│
└─utils
beamsearch.py #to do 未完工
greedysearch.py #贪婪搜索,用于测试
__init__.py
Install dependencies
$ pip install -r requirements.txt
$ python datapreprocess.py
Preprocess the corpus to generate corpus.pth ( corpus.pth has been uploaded here, so this step can be omitted )
Modifiable parameters:
# datapreprocess.py
corpus_file = 'clean_chat_corpus/qingyun.tsv' #未处理的对话数据集
max_voc_length = 10000 #字典最大长度
min_word_appear = 10 #加入字典的词的词频最小值
max_sentence_length = 50 #最大句子长度
save_path = 'corpus.pth' #已处理的对话数据集保存路径
When using the knowledge base, you need to pass in the parameter use_QA_first=True
At this time, for the input string, the best question and answer are first matched in the knowledge base and returned. When it cannot be found, the chatbot is called to automatically generate a reply.
The knowledge base here is a collection of 100 frequently asked questions and answers compiled from Tencent Cloud official documents, for testing only!
$ python main.py chat --use_QA_first=True
Due to the needs of course design, Tencent Cloud's question and answer pairs have been added, but it is irrelevant to the chat robot project, so when used generally, use_QA_first=False
, this parameter defaults to True
$ python main.py chat --use_QA_first=False
$ python main.py chat
exit
, quit
, or q
Explain in the config.py
file
When you need to pass in new parameters, you only need to pass them in from the command line, in the form of
$ python main.py chat --model_ckpt= ' checkpoints/chatbot_0509_1437 ' --use_QA_first=False
The above command indicates the path to load the trained model and whether to use the knowledge base
Corpus name | Number of corpus | Explanation of corpus sources | Corpus characteristics | Corpus sample | Whether it has been participled |
---|---|---|---|---|---|
qingyun (Qingyun corpus) | 10W | A chatbot communication group | Relatively good, life-friendly | Q: It seems you love money very much. A: Oh, really? Then you're almost there | no |
$ python train_eval.py train [--options]
The quantitative evaluation part has not been written yet. It should be measured by perplexity. Currently, it can only generate sentences and manually evaluate the quality.
$ python train_eval.py eval [--options]