ASRT is a Chinese speech recognition system based on deep learning. If you like it, please click "Star" ~
ReadMe Language | Chinese version | English |
ASRT project homepage | Release version download | View the Wiki document of this project | Practical effect experience Demo | Reward the author
If you have any problems during the running or use of the program, you can raise it in the issue in time and I will respond as soon as possible. The author of this project communicates in the QQ group: 894112051. To join the WeChat group, please add the AI Lemon WeChat ID: ailemon-me , and note "ASRT Speech Recognition"
Please read the project documentation, FAQ and Issues carefully before asking questions to avoid repeated questions.
If there are any abnormalities when the program is running, please send a complete screenshot when asking questions, and indicate the CPU architecture used, GPU model, operating system, Python, TensorFlow and CUDA versions, and whether any code has been modified or data sets have been added or deleted, etc. .
This project uses tensorFlow.keras based on deep convolutional neural network and long short-term memory neural network, attention mechanism and CTC implementation.
Take the operation under Linux system as an example:
First, clone this project to your computer through Git, and then download the data sets required for training in this project. See the download link at the end of the document.
$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git
Or you can use the "Fork" button to make a copy of this project, and then clone it locally using your own SSH key.
After cloning the warehouse through git, enter the project root directory; and create a subdirectory to store data, such as /data/speech_data
(you can use a soft link instead), and then directly extract the downloaded data set into it
Note that in the current version, six data sets including Thchs30, ST-CMDS, Primewords, aishell-1, aidatatang200, and MagicData are added by default in the configuration file. Please delete them yourself if you do not need them. If you want to use other data sets, you need to add data configuration yourself and organize the data in advance using the standard format supported by ASRT.
$ cd ASRT_SpeechRecognition
$ mkdir /data/speech_data
$ tar zxf <数据集压缩文件名> -C /data/speech_data/
Download the pinyin label file of the default data set:
$ python download_default_datalist.py
Models currently available are 24, 25, 251 and 251bn
Before running this project, please install the necessary Python3 version dependent libraries
To start training for this project, please execute:
$ python3 train_speech_model.py
To start testing this project, please execute:
$ python3 evaluate_speech_model.py
Before testing, please ensure that the model file path filled in the code exists.
Predict speech recognition text for a single audio file:
$ python3 predict_speech_file.py
To start the API server of the ASRT HTTP protocol, please execute:
$ python3 asrserver_http.py
Locally test whether calling the HTTP protocol API service is successful:
$ python3 client_http.py
To start the API server of the ASRT GRPC protocol, please execute:
$ python3 asrserver_grpc.py
Locally test whether calling the GRPC protocol API service is successful:
$ python3 client_grpc.py
Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for speech recognition. For details, see the Wiki document to download the ASRT speech recognition client SDK and Demo.
If you want to train and use a non-251bn model, please make modifications in the corresponding location of from speech_model.xxx import xxx
in the code.
Use docker to deploy ASRT directly:
$ docker pull ailemondocker/asrt_service:1.3.0
$ docker run --rm -it -p 20001:20001 -p 20002:20002 --name asrt-server -d ailemondocker/asrt_service:1.3.0
Only the CPU runs inference recognition without training.
DCNN+CTC
Among them, the maximum time length of the input audio is 16 seconds, and the output is the corresponding Chinese Pinyin sequence.
The trained model is included in the release version server program compressed package. The release version finished server program can be downloaded here: ASRT download page.
The Releases page under this Github repository also includes introduction information for each different version. The zip package under each version also contains the release server program package of the trained model.
Maximum entropy hidden Markov model based on probability graph
The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.
Currently, the best model can basically achieve 85% Chinese Pinyin accuracy on the test set.
If you don’t know how to install the environment, please run the following command directly (provided you have a GPU and Python3.9, CUDA 11.2 and cudnn 8.1 have been installed):
$ pip install -r requirements.txt
Depends on environment and performance configuration requirements
For complete content, please view: Several latest free and open source Chinese speech data sets
Dataset | duration | size | Domestic download | Download abroad |
---|---|---|---|---|
THCHS30 | 40h | 6.01G | data_thchs30.tgz | data_thchs30.tgz |
ST-CMDS | 100h | 7.67G | ST-CMDS-20170001_1-OS.tar.gz | ST-CMDS-20170001_1-OS.tar.gz |
AIShell-1 | 178h | 14.51G | data_aishell.tgz | data_aishell.tgz |
Primewords | 100h | 8.44G | primewords_md_2018_set1.tar.gz | primewords_md_2018_set1.tar.gz |
MagicData | 755h | 52G/1.0G/2.2G | train_set.tar.gz/dev_set.tar.gz/test_set.tar.gz | train_set.tar.gz/dev_set.tar.gz/test_set.tar.gz |
Note: AISHELL-1 data set decompression method
$ tar xzf data_aishell.tgz
$ cd data_aishell/wav
$ for tar in *.tar.gz; do tar xvf $tar; done
Special thanks! Thanks to the predecessors for the public speech data set
If the provided dataset link cannot be opened and downloaded, please click the link OpenSLR
ASRT provides SDK access capabilities for different platforms and programming languages for clients to call and develop speech recognition functions through RPC. For other platforms, speech recognition functions can be accessed directly by calling the general RESTful Open API. Please see the ASRT project documentation for specific access steps.
Client platform | Project repository link |
---|---|
Windows Client SDK and Demo | ASRT_SDK_WinClient |
Cross-platform Python3 client SDK and Demo | ASRT_SDK_Python3 |
Cross-platform Golang client SDK and Demo | asrt-sdk-go |
Java Client SDK and Demo | ASRT_SDK_Java |
Please check this article for the principles of ASRT:
Please see the ASRT training and deployment tutorial:
For frequently asked questions about the principles of statistical language models, please see:
For questions about CTC, please see:
For more content, please visit the author’s blog: AI Lemon Blog
Or use the AI Lemon site search engine to search for related information
GPL v3.0 © nl8590687 Author: AI Lemon
DOI: 10.5281/zenodo.5808434
Contributor page
@nl8590687 (repo owner)