Voice_ChatBot Download - Voice_ChatBot Source code download

Voice ChatBot

The project consists of two parts - a voice bot and a RESTful server for interacting with it.

To run the bot locally, you need to run python3 bot.py (or run_bot.sh ) and select the desired operation option in the proposed menu (more details here).

To start a RESTful server that provides an interface for interacting with voice bot modules, you need to run python3 rest_server.py (or run_rest_server.sh ) (more details here).

To build a docker image based on a RESTful server, run sudo docker build -t voice_chatbot:0.1 . (more details here).

ATTENTION! This was my graduation project, so the architecture and code here are not very good, I understand this, and as soon as I have time, I will update everything.

Dependencies

A complete list of all dependencies required for operation:

For Python3.5-3.6: decorator, Flask (>=1.0.2), Flask-HTTPAuth (>=3.2.4), gensim, gevent (>=1.3.7), h5py, Keras (>=2.2.4) , matplotlib, numpy, pocketsphinx, pydub, simpleaudio, recurrentshop, requests, seq2seq, tensorflow[-gpu].
For Ubuntu: ffmpeg, x264, x265, make, git, scons, gcc, pkg-config, pulseaudio, libpulse-dev, portaudio19-dev, libglibmm-2.4-dev, libasound-dev, libao4, libao-dev, sonic, sox , swig, flite1-dev, net-tools, zip, unzip.
Data for training and ready-made models: you must manually download the archive Voice_ChatBot_data.zip (3Gb) from Google Drive and unpack it into the root of the project ( data and install_files folders).

If you are using Ubuntu 16.04 or higher, you can use install_packages.sh (tested on Ubuntu 16.04 and 18.04) to install all packages. By default, TensorFlow for CPU will be installed. If you have an nvidia graphics card with the official driver version 410 installed, you can install TensorFlowGPU. To do this, you need to pass the gpu parameter when running install_packages.sh . For example:

 ./install_packages.sh gpu

In this case, 2 archives will be downloaded from my Google Drive:

Install_CUDA10.0_cuDNN_for410.zip (2.0Gb) with CUDA 10.0 and cuDNN 7.5.0 (if the gpu parameter was passed). The installation will be completed automatically, but if something goes wrong, there is an Install.txt instruction in the downloaded archive.
Voice_ChatBot_data.zip (3Gb) with training data and ready-made models. It will be automatically unpacked into data and install_files folders in the project root.

If you cannot or do not want to use the script to install all the required packages, you must manually install RHVoice and CMUclmtk_v0.7 using the instructions in install_files/Install RHVoice.txt and install_files/Install CMUclmtk.txt . You also need to copy the language, acoustic model and dictionary files for PocketSphinx from temp/ to /usr/local/lib/python3.6/dist-packages/pocketsphinx/model (your path to python3.6 may be different). The files of the language model prepared_questions_plays_ru.lm and the dictionary prepared_questions_plays_ru.dic must be renamed to ru_bot_plays_ru.lm and ru_bot_plays_ru.dic (or change their name to speech_to_text.py if you have your own language model and dictionary).

Bot

The bot is based on a recurrent neural network, the AttentionSeq2Seq model. In the current implementation, it consists of 2 bidirectional LSTM cells in the encoder, an attention layer, and 2 LSTM cells in the decoder. Using an attention model allows you to establish a “soft” correspondence between input and output sequences, which improves quality and productivity. The input dimension in the last configuration is 500 and the sequence length is 26 (i.e., the maximum length of sentences in the training set). Words are converted into vectors using the word2vec encoder (with a dictionary of 445,000 words) from the gensim library. The seq2seq model is implemented using Keras and RecurrentShop. The trained seq2seq model (the weights of which are located in data/plays_ru/model_weights_plays_ru.h5 ) with the parameters specified in the source files has an accuracy of 99.19% (i.e. the bot will answer 1577 out of 1601 questions correctly).

At the moment, there are 3 sets of data for training the bot: 1601 question-answer pairs from various plays ( data/plays_ru ), 136,000 pairs from various works ( data/conversations_ru , thanks to NLP Datasets) and 2,500,000 pairs from subtitles for 347 TV series ( data/subtitles_ru , more details in Russian subtitles dataset). The word2vec models are trained on all datasets, but the neural network is only trained on the playsets dataset.

Training the word2vec model and a neural network on a dataset of plays without changing parameters takes approximately 7.5 hours on nvidia gtx1070 and intel core i7. Training on data sets from works and subtitles on this hardware will last at least several days.

The bot can work in several modes:

Training the seq2seq model.
Working with a trained seq2seq model in text mode.
Working with a trained seq2seq model and voicing responses using RHVoice.
Working with a trained seq2seq model with speech recognition using PocketSphinx.
Working with a trained seq2seq model with voiceover and speech recognition.

1. Training seq2seq model

The training set consists of 1600 question %% answer pairs taken from various Russian plays. It is stored in the file data/plays_ru/plays_ru.txt . Each pair of questions %% answer is written on a new line, i.e. There is only one pair on one line.

All stages necessary for training are performed by the prepare() or load_prepared() and train() methods of the TextToText class from text_to_text.py module and the build_language_model() method of LanguageModel class from the preparing_speech_to_text.py module. Or you can use the train() function of the bot.py module.

To run the bot in training mode, you need to run bot.py with the train parameter. For example, like this:

 python3 bot.py train

Or you can simply run bot.py (or run_bot.sh ) and select mode 1 and 1 from the proposed menu.

The learning process consists of several stages:

1. Preparation of the training sample.

To prepare the training sample, the source_to_prepared.py module, consisting of the SourceToPrepared class, is used. This class reads a training set from a file, separates questions and answers, removes unsupported characters and punctuation, and converts the resulting questions and answers into fixed-size sequences (using <PAD> filler words). This class also prepares questions for the network and processes its responses. For example:

Input: "Зачем нужен этот класс? %% Для подготовки данных"

Output: [['<PAD>', ..., '<PAD>', '?', 'класс', 'этот', 'нужен', 'Зачем', '<GO>'], ['Для', 'подготовки', 'данных', '<EOS>', '<PAD>', ..., '<PAD>']]

The training sample is read from the file data/plays_ru/plays_ru.txt , the converted [question, answer] pairs are saved in the file data/plays_ru/prepared_plays_ru.pkl . Also, a histogram of the sizes of questions and answers is built, which is saved in data/plays_ru/histogram_of_sizes_sentences_plays_ru.png .

To prepare a training sample from a data set based on plays, simply pass the name of the corresponding file to the prepare_all() method. To prepare a training sample from a dataset based on works or subtitles, you must first call combine_conversations() or combine_subtitles() and then call preapre_all() .

2. Translation of words into real vectors.

The word_to_vec.py module, consisting of the WordToVec class, is responsible for this stage. This class encodes fixed-size sequences (i.e. our questions and answers) into real vectors. The word2vec encoder from the gensim library is used. The class implements methods for encoding all [question, answer] pairs from the training set into vectors at once, as well as for encoding a question to the network and decoding its answer. For example:

Input: [['<PAD>', ..., '<PAD>', '?', 'класс', 'этот', 'нужен', 'Зачем', '<GO>'], ['Для', 'кодирования', 'предложений', '<EOS>', '<PAD>', ..., '<PAD>']]

Output: [[[0.43271607, 0.52814275, 0.6504923, ...], [0.43271607, 0.52814275, 0.6504923, ...], ...], [[0.5464854, 1.01612, 0.15063584, ...], [0.88263285, 0.62758327, 0.6659863, ...], ...]] (i.e. each word is encoded as a vector with a length of 500 (this value can be changed, the size argument in the build_word2vec() method))

Pairs [question, answer] are read from the file data/plays_ru/prepared_plays_ru.pkl (which was obtained at the previous stage; to expand and improve the quality of the model, it is recommended to additionally pass a preprocessed data set from subtitles data/subtitles_ru/prepared_subtitles_ru.pkl to the build_word2vec() method) , encoded pairs are saved to the file data/plays_ru/encoded_plays_ru.npz . Also, during the work process, a list of all used words is built, i.e. dictionary, which is saved in the file data/plays_ru/w2v_vocabulary_plays_ru.txt . The trained word2vec model is also saved in data/plays_ru/w2v_model_plays_ru.bin .

To translate words from the training set into vectors, just pass the name of the corresponding file to the build_word2vec() method and set the desired parameters.

3. Network training.

At this stage, the seq2seq model is trained on previously prepared data. The text_to_text.py module, consisting of the TextToText class, is responsible for this. This class trains the network, saves the network model and weighting coefficients, and allows you to conveniently interact with the trained model.

For training, you need a file data/plays_ru/encoded_plays_ru.npz containing [question, answer] pairs encoded into vectors that were obtained at the previous stage. During the training process, after every 5th epoch (this value can be changed), the extreme intermediate result of network training is saved in the file data/plays_ru/model_weights_plays_ru_[номер_итерации].h5 , and at the last iteration in the file data/plays_ru/model_weights_plays_ru.h5 (iteration - one network training cycle, a certain number of epochs, after which the weights are saved to a file and you can, for example, evaluate the accuracy of the network or display other parameters. By default, the number of epochs is 5, and the total number of iterations is 200). The network model is saved in the file data/plays_ru/model_plays_ru.json .

After training the network, the quality of training is assessed by submitting all questions to the input of the trained network and comparing the network’s answers with the standard answers from the training set. If the accuracy of the estimated model is higher than 75%, then the incorrect answers from the network are saved to the file data/plays_ru/wrong_answers_plays_ru.txt (so that they can be analyzed later).

To train the network, just pass the name of the corresponding file to the train() method and set the desired parameters.

4. Building a language model and dictionary for PocketSphinx.

This stage is needed if speech recognition will be used. At this stage, a static language model and phonetic dictionary for PocketSphinx are created based on questions from the training set (caution: the more questions in the training set, the longer it will take PocketSphinx to recognize speech). To do this, use build_language_model() method (which accesses text2wfreq, wfreq2vocab, text2idngram and idngram2lm from CMUclmtk_v0.7) of the LanguageModel class from the preparing_speech_to_text.py module. This method uses questions from the file with the original training set (before they are prepared by the source_to_prepared.py module), saves the language model in the file temp/prepared_questions_plays_ru.lm , and the dictionary in temp/prepared_questions_plays_ru.dic ( plays_ru may change depending on what training set was used). At the end of the work, the language model and dictionary will be copied to /usr/local/lib/python3.х/dist-packages/pocketsphinx/model with the names ru_bot_plays_ru.lm and ru_bot_plays_ru.dic ( plays_ru can change in the same way as in the previous stage, you will need to enter the root user password).

2. Working with the trained seq2seq model in text mode

To interact with the trained seq2seq model, the predict() function is intended (which is a wrapper over the predict() method of the TextToText class from the text_to_text.py module) of the bot.py module. This function supports several operating modes. In text mode, i.e. When the user enters a question from the keyboard and the network responds with text, only the predict() method of the TextToText class from the text_to_text.py module is used. This method accepts a string with a question to the network and returns a string with the network’s response. To work, you need: the file data/plays_ru/w2v_model_plays_ru.bin with the trained word2vec model, the file data/plays_ru/model_plays_ru.json with the parameters of the network model, and the file data/plays_ru/model_weights_plays_ru.h5 with the weights of the trained network.

To run the bot in this mode, you need to run bot.py with the predict parameter. For example, like this:

 python3 bot.py predict

You can also simply run bot.py (or run run_bot.sh ) and select mode 2 and 1 in the proposed menu.

3. Working with a trained seq2seq model and voicing responses using RHVoice

This mode differs from the previous one in that the parameter speech_synthesis = True is passed to the predict() function of bot.py module. This means that interaction with the network will proceed in the same way as in mode 2, but the network’s response will additionally be voiced.

Voicing answers, i.e. speech synthesis, implemented in the get() method of the TextToSpeech class from the text_to_speech.py module. This class requires RHVoice-client to be installed and, using command line arguments, passes it the necessary parameters for speech synthesis (you can see about installing RHVoice and examples of accessing RHVoice-client in install_files/Install RHVoice.txt ). The get() method takes as input the string that needs to be converted to speech, and, if required, the name of the .wav file in which the synthesized speech will be saved (with a sampling rate of 32 kHz and a depth of 16 bits, mono; if not specified, speech will be played immediately after synthesis). When creating an object of the TextToSpeech class, you can specify the name of the voice to use. 4 voices are supported: male Aleksandr and three female - Anna, Elena and Irina (more details in RHVoice Wiki).

To run the bot in this mode, you need to run bot.py with the predict -ss parameters. For example, like this:

 python3 bot.py predict -ss

You can also simply run bot.py (or run run_bot.sh ) and select mode 3 and 1 in the proposed menu.

4. Working with a trained seq2seq model with speech recognition using PocketSphinx

To work in this mode, you need to pass the parameter speech_recognition = True to the predict() function of the bot.py module. This means that interaction with the network, or rather entering questions, will be carried out using voice.

Speech recognition is implemented in the get() method of the SpeechToText class of speech_to_text.py module. This class uses PocketSphinx and a language model with a dictionary ( ru_bot_plays_ru.lm and ru_bot_plays_ru.dic ), which were built in network training mode. The get() method can work in two modes: from_file - speech recognition from a .wav or .opus file with a sampling frequency >=16 kHz, 16bit, mono (the file name is passed as a function argument) and from_microphone - speech recognition from a microphone. The operating mode is set when creating an instance of the SpeechRecognition class, because Loading the language model takes some time (the larger the model, the longer it takes to load).

To run the bot in this mode, you need to run bot.py with the parameters predict -sr . For example, like this:

 python3 bot.py predict -sr

You can also simply run bot.py (or run run_bot.sh ) and select mode 4 and 1 in the proposed menu.

5. Working with a trained seq2seq model with voice responses and speech recognition

This is a combination of modes 3 and 4.

To work in this mode, you need to pass the parameters speech_recognition = True and speech_synthesis = True to the predict() function of the bot.py module. This means that questions will be entered using voice, and network responses will be spoken. A description of the modules used can be found in the description of modes 3 and 4.

To run the bot in this mode, you need to run bot.py with the parameters predict -ss -sr . For example, like this:

 python3 bot.py predict -sr -ss

 python3 bot.py predict -ss -sr

You can also simply run bot.py (or run run_bot.sh ) and select mode 5 and 1 in the proposed menu.

RESTful server

This server provides a REST api for interacting with the bot. When the server starts, a neural network trained on a data set from plays is loaded. Datasets from works and subtitles are not yet supported.

The server is implemented using Flask, and multi-threaded mode (production version) using gevent.pywsgi.WSGIServer. The server also has a limit on the size of received data in the request body equal to 16 MB. The implementation is in the rest_server.py module.

You can start the WSGI server by running run_rest_server.sh (starting the WSGI server at 0.0.0.0:5000 ).

The server supports command line arguments, which make starting it a little easier. The arguments have the following structure: [ключ(-и)] [адрес:порт] .

Possible keys:

-d - launch a test Flask server (if the key is not specified, the WSGI server will be launched)
-s - start the server with https support (uses a self-signed certificate, obtained using openssl)

Valid options адрес:порт :

host:port - launch on the specified host and port
localaddr:port - launch with auto-detection of the machine address on the local network and the specified port
host:0 or localaddr:0 - if port = 0 , then any available port will be selected automatically

List of possible combinations of command line arguments and their description:

without arguments - launch a WSGI server with auto-detection of the machine address on the local network and port 5000 . For example: python3 rest_server.py
host:port - launch the WSGI server on the specified host and port . For example: python3 rest_server.py 192.168.2.102:5000
-d - launch a test Flask server on 127.0.0.1:5000 . For example: python3 rest_server.py -d
-d host:port - launch a test Flask server on the specified host and port . For example: python3 rest_server.py -d 192.168.2.102:5000
-d localaddr:port - launch a test Flask server with auto-detection of the machine address on the local network and port port . For example: python3 rest_server.py -d localaddr:5000
-s - launch a WSGI server with https support, auto-detection of the machine address on the local network and port 5000 . For example: python3 rest_server.py -s
-s host:port - launch a WSGI server with https support on the specified host and port . For example: python3 rest_server.py -s 192.168.2.102:5000
-s -d - launch a test Flask server with https support on 127.0.0.1:5000 . For example: python3 rest_server.py -s -d
-s -d host:port - launches a test Flask server with https support on the specified host and port . For example: python3 rest_server.py -s -d 192.168.2.102:5000
-s -d localaddr:port - launches a test Flask server with https support, auto-detection of the machine address on the local network and port port . For example: python3 rest_server.py -s -d localaddr:5000

The server can choose the available port itself; to do this, you need to specify port 0 in host:port or localaddr:port (for example: python3 rest_server.py -d localaddr:0 ).

A total of 5 queries are supported:

GET request to /chatbot/about will return information about the project
A GET request to /chatbot/questions will return a list of all supported questions
POST request to /chatbot/speech-to-text , accepts a .wav/.opus file and returns the recognized string
POST request to /chatbot/text-to-speech , takes a string and returns a .wav file with synthesized speech
POST request to /chatbot/text-to-text , accepts a string and returns the bot's response as a string

Server description

1. The server has basic http authorization. Those. To gain access to the server, you need to add a header to each request containing login:password, encoded using base64 (login: bot , password: test_bot ). Example in python:

 import requests
import base64

auth = base64.b64encode('testbot:test'.encode())
headers = {'Authorization' : "Basic " + auth.decode()}

It will look like this:

 Authorization: Basic dGVzdGJvdDp0ZXN0

2. In the speech recognition request (which is number 3), the server expects a .wav or .opus file (>=16kHz 16bit mono) with recorded speech, which is also transmitted to json using base64 encoding (i.e. opens .wav /.opus file, reads in an array of byte, then shut up base64 , the resulting array is decoded from bypass form to line utf-8 is placed in JSON), in Python it looks like this:

 # Формирование запроса
auth = base64.b64encode('testbot:test'.encode())
headers = {'Authorization' : "Basic " + auth.decode()}

with open('test.wav', 'rb') as audio:
    data = audio.read()
data = base64.b64encode(data)
data = {'wav' : data.decode()}

# Отправка запроса серверу
r = requests.post('http://' + addr + '/chatbot/speech-to-text', headers=headers, json=data)

# Разбор ответа
data = r.json()
data = data.get('text')
print(data)

3. In the request for speech synthesis (which is at number 4), the server will send the JSON response with the .Wav file (16bit 32kHz Mon) with a synthesized speech, which was encoded as described above (to be decoded from json to get it from json The desired line in the byte array, then decode it using base64 and write it to a file or stream, to then play), an example on Python:

 # Формирование запроса
auth = base64.b64encode('testbot:test'.encode())
headers = {'Authorization' : "Basic " + auth.decode()}
data = {'text':'который час'}

# Отправка запроса серверу
r = requests.post('http://' + addr + '/chatbot/text-to-speech', headers=headers, json=data)

# Разбор ответа
data = r.json()
data = base64.b64decode(data.get('wav'))
with open('/home/vladislav/Проекты/Voice chat bot/temp/answer.wav', 'wb') as audio:
    audio.write(data)

Transmitted data in each request

All transmitted data are wrapped in JSON (including errors).

The server transmits to the client:

 {
"text" : "Информация о проекте."
}

The server transmits to the client:

 {
"text" : ["Вопрос 1",
          "Вопрос 2",
          "Вопрос 3"]
}

The client in the body of the request must send:

 {
"wav" : "UklGRuTkAABXQVZFZm10IBAAAAABAAEAAH..."
}

 {
"opus" : "ZFZm10IBUklQVZFZm10IBARLASBAAEOpH..."
}

The server will tell him:

 {
"text" : "который час"
}

The client in the body of the request must send:

 {
"text" : "который час"
}

The server will tell him:

 {
"wav" : "UklGRuTkAABXQVZFZm10IBAAAAABAAEAAH..."
}

The client in the body of the request must send:

 {
"text" : "прощай"
}

The server will tell him:

 {
"text" : "это снова я"
}

Example requests

1. GET CARAS on /chatbot/about

An example of a request that forms python-requests :

 GET /chatbot/about HTTP/1.1
Host: 192.168.2.83:5000
Connection: keep-alive
Accept-Encoding: gzip, deflate
Authorization: Basic dGVzdGJvdDp0ZXN0
User-Agent: python-requests/2.9.1

An example of a request that forms Curl ( curl -v -u testbot:test -i http://192.168.2.83:5000/chatbot/about ):

 GET /chatbot/about HTTP/1.1
Host: 192.168.2.83:5000
Authorization: Basic dGVzdGJvdDp0ZXN0
User-Agent: curl/7.47.0

In both cases, the server replied:

 HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 305
Date: Fri, 02 Nov 2018 15:13:21 GMT

{
"text" : "Информация о проекте."
}

2. GET request on /chatbot/questions

An example of a request that forms python-requests :

 GET /chatbot/questions HTTP/1.1
Host: 192.168.2.83:5000
Authorization: Basic dGVzdGJvdDp0ZXN0
User-Agent: python-requests/2.9.1
Connection: keep-alive
Accept-Encoding: gzip, deflate

An example of a request that forms Curl ( curl -v -u testbot:test -i http://192.168.2.83:5000/chatbot/questions ):

 GET /chatbot/questions HTTP/1.1
Host: 192.168.2.83:5000
Authorization: Basic dGVzdGJvdDp0ZXN0
User-Agent: curl/7.47.0

In both cases, the server replied:

 HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 1086
Date: Fri, 02 Nov 2018 15:43:06 GMT

{
"text" : ["Что случилось?",
          "Срочно нужна твоя помощь.",
          "Ты уезжаешь?",
          ...]
}

3. Post supply to /chatbot/speech-to-text

An example of a request that forms python-requests :

 POST /chatbot/speech-to-text HTTP/1.1
Host: 192.168.2.83:5000
User-Agent: python-requests/2.9.1
Accept: */*
Content-Length: 10739
Connection: keep-alive
Content-Type: application/json
Authorization: Basic dGVzdGJvdDp0ZXN0
Accept-Encoding: gzip, deflate

{
"wav" : "UklGRuTkAABXQVZFZm10IBAAAAABAAEAAH..."
}

An example of a request that forms Curl ( curl -v -u testbot:test -i -H "Content-Type: application/json" -X POST -d '{"wav":"UklGRuTkAABXQVZFZm10IBAAAAABAAEAAH..."}' http://192.168.2.83:5000/chatbot/speech-to-text Text):

 POST /chatbot/speech-to-text HTTP/1.1
Host: 192.168.2.83:5000
Authorization: Basic dGVzdGJvdDp0ZXN0
User-Agent: curl/7.47.0
Accept: */*
Content-Type: application/json
Content-Length: 10739

{
"wav" : "UklGRuTkAABXQVZFZm10IBAAAAABAAEAAH..."
}

The server replied:

 HTTP/1.1 200 OK
Content-Length: 81
Date: Fri, 02 Nov 2018 15:57:13 GMT
Content-Type: application/json

{
"text" : "Распознные слова из аудиозаписи"
}

4. Post-provision on /chatbot/text-to-speech

An example of a request that forms python-requests :

 POST /chatbot/text-to-speech HTTP/1.1
Host: 192.168.2.83:5000
Connection: keep-alive
Accept: */*
User-Agent: python-requests/2.9.1
Accept-Encoding: gzip, deflate
Content-Type: application/json
Content-Length: 73
Authorization: Basic dGVzdGJvdDp0ZXN0

{
"text" : "который час"
}

An example of a request that forms Curl ( curl -v -u testbot:test -i -H "Content-Type: application/json" -X POST -d '{"text":"который час"}' http://192.168.2.83:5000/chatbot/text-to-speech ):

 POST /chatbot/text-to-speech HTTP/1.1
Host: 192.168.2.83:5000
Authorization: Basic dGVzdGJvdDp0ZXN0
User-Agent: curl/7.47.0
Accept: */*
Content-Type: application/json
Content-Length: 32

{
"text" : "который час"
}

The server replied:

 HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 78151
Date: Fri, 02 Nov 2018 16:36:02 GMT

{
"wav" : "UklGRuTkAABXQVZFZm10IBAAAAABAAEAAH..."
}

5. Post-call on /chatbot/text-to-text

An example of a request that forms python-requests :

 POST /chatbot/text-to-text HTTP/1.1
Host: 192.168.2.83:5000
Accept-Encoding: gzip, deflate
Content-Type: application/json
User-Agent: python-requests/2.9.1
Connection: keep-alive
Content-Length: 48
Accept: */*
Authorization: Basic dGVzdGJvdDp0ZXN0

{
"text" : "прощай"
}

An example of a request that forms Curl ( curl -v -u testbot:test -i -H "Content-Type: application/json" -X POST -d '{"text":"прощай"}' http://192.168.2.83:5000/chatbot/text-to-text ):

 POST /chatbot/text-to-text HTTP/1.1
Host: 192.168.2.83:5000
Authorization: Basic dGVzdGJvdDp0ZXN0
User-Agent: curl/7.47.0
Accept: */*
Content-Type: application/json
Content-Length: 23

{
"text" : "прощай"
}

The server replied:

 HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 68
Date: Fri, 02 Nov 2018 16:41:22 GMT

{
"text" : "это снова я"
}

The estimated algorithm for working with the server

Request a list of questions from the server (request 2) and display it
Depending on the selected mode:

Write down a speech from a client microphone
Send to the server (request 3) and get an answer with recognized text
Display the text in the input field
Send the text to the server (request 5) and get a bot response
Send the answer to the bot to the server (request 4) and get an audio file with a synthesized speech
Reproduce an audio file

If the client wants to find out information about this project, send a request 1 to the server and display the data received

Docker-image with Restful server

The project contains Dockerfile, which allows you to assemble Docker an image based on this project. If you used install_packages.sh and before installing all dependencies and you did not install Docker earlier, you need to install it manually. For example, like this (checked at Ubuntu 04/16/18.04):

 sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
sudo apt-add-repository 'deb https://apt.dockerproject.org/repo ubuntu-xenial main' -y
sudo apt-get -y update
sudo apt-get install -y docker-engine

After installation, perform sudo systemctl status docker to make sure that everything has been installed and works (in the output of this command you will find a line with green text active (running) ).

To assemble the image, you must go to the terminal into the folder with the project and perform ( -t -launch of the terminal . -the directory from which the Docker Build is called (the point means all files for the image are located in the current directory), voice_chatbot:0.1 -the image label and the image label and the image label and His version):

 sudo docker build -t voice_chatbot:0.1 .

After the successful execution of this operation, you can display a list of existing images by completing:

 sudo docker images

In the list you will see our image - voice_chatbot:0.1 .

Now you can launch this image ( -t -launch of the terminal, -i -interactive mode, --rm -delete the container after the end of its operation, -p 5000:5000 -throw all the connections to the port 5000 to the car car to the port on the port. 5000 (you can also clearly indicate another address to which you will need to connect from the outside, for example: -p 127.0.0.1:5000:5000 :5000)):

 sudo docker run -ti --rm -p 5000:5000 voice_chatbot:0.1

As a result, the server will start at 0.0.0.0:5000 and you can contact it at the address indicated in the terminal (if you did not indicate the other when starting the image).

Note : the assembled Docker-image weighs 5.2GB. The initial project files also include the .dockerignore file, in which there are file names that do not need to be added to the image. To minimize the size of the final image, all files related to setting data from stories and subtitles, files with intermediate results of data processing and training of the neural network were excluded from it. This means that the image contains only files of a trained network and raw source data sets.

Just in case, in the source files of the project there is a command_for_docker.txt file containing a minimum necessary set of commands for working with Docker.

If you have questions or you want to cooperate, you can write to me by mail: [email protected] or in linkedin.

Expand