ASRT_SpeechRecognition Download - ASRT_SpeechRecognition Source code download

ASRT_SpeechRecognition

Python

v1.3.0 Released

Download

ASRT is a Chinese speech recognition system based on deep learning. If you like it, please click "Star" ~

ReadMe Language | Chinese version | English |

ASRT project homepage | Release version download | View the Wiki document of this project | Practical effect experience Demo | Reward the author

If you have any problems during the running or use of the program, you can raise it in the issue in time and I will respond as soon as possible. The author of this project communicates in the QQ group: 894112051. To join the WeChat group, please add the AI Lemon WeChat ID: ailemon-me , and note "ASRT Speech Recognition"

Please read the project documentation, FAQ and Issues carefully before asking questions to avoid repeated questions.

If there are any abnormalities when the program is running, please send a complete screenshot when asking questions, and indicate the CPU architecture used, GPU model, operating system, Python, TensorFlow and CUDA versions, and whether any code has been modified or data sets have been added or deleted, etc. .

Introduction Introduction

This project uses tensorFlow.keras based on deep convolutional neural network and long short-term memory neural network, attention mechanism and CTC implementation.

Minimum software and hardware requirements for training models

hardware

CPU: 4 cores (x86_64, amd64) +
RAM: 16GB+
GPU: NVIDIA, Graph Memory 11GB+ (starting with 1080ti)
Hard drive: 500 GB mechanical hard drive (or solid state drive)

software

Linux: Ubuntu 20.04+ / CentOS 7+ (training + inference) or Windows: 10/11 (inference only)
Python: 3.9 - 3.11 and later versions
TensorFlow: 2.5 - 2.11 and later versions

quick start

Take the operation under Linux system as an example:

First, clone this project to your computer through Git, and then download the data sets required for training in this project. See the download link at the end of the document.

$ git clone https://github.com/nl8590687/ASRT_SpeechRecognition.git

Or you can use the "Fork" button to make a copy of this project, and then clone it locally using your own SSH key.

After cloning the warehouse through git, enter the project root directory; and create a subdirectory to store data, such as /data/speech_data (you can use a soft link instead), and then directly extract the downloaded data set into it

Note that in the current version, six data sets including Thchs30, ST-CMDS, Primewords, aishell-1, aidatatang200, and MagicData are added by default in the configuration file. Please delete them yourself if you do not need them. If you want to use other data sets, you need to add data configuration yourself and organize the data in advance using the standard format supported by ASRT.

$ cd ASRT_SpeechRecognition

$ mkdir /data/speech_data

$ tar zxf <数据集压缩文件名> -C /data/speech_data/

Download the pinyin label file of the default data set:

$ python download_default_datalist.py

Models currently available are 24, 25, 251 and 251bn

Before running this project, please install the necessary Python3 version dependent libraries

To start training for this project, please execute:

$ python3 train_speech_model.py

To start testing this project, please execute:

$ python3 evaluate_speech_model.py

Before testing, please ensure that the model file path filled in the code exists.

Predict speech recognition text for a single audio file:

$ python3 predict_speech_file.py

To start the API server of the ASRT HTTP protocol, please execute:

$ python3 asrserver_http.py

Local test whether calling HTTP protocol API service is successful:

$ python3 client_http.py

To start the API server of the ASRT GRPC protocol, please execute:

$ python3 asrserver_grpc.py

Locally test whether calling the GRPC protocol API service is successful:

$ python3 client_grpc.py

Please note that after opening the API server, you need to use the client software corresponding to this ASRT project for speech recognition. For details, see the Wiki document to download the ASRT speech recognition client SDK and Demo.

If you want to train and use a non-251bn model, please make modifications in the corresponding location of from speech_model.xxx import xxx in the code.

Use docker to deploy ASRT directly:

$ docker pull ailemondocker/asrt_service:1.3.0
$ docker run --rm -it -p 20001:20001 -p 20002:20002 --name asrt-server -d ailemondocker/asrt_service:1.3.0

Only the CPU runs inference recognition without training.

Model model

Speech Model Speech Model

DCNN+CTC

Among them, the maximum time length of the input audio is 16 seconds, and the output is the corresponding Chinese Pinyin sequence.

Questions about downloading a trained model

The trained model is included in the release version server program compressed package. The release version finished server program can be downloaded here: ASRT download page.

The Releases page under this Github repository also includes introduction information for each different version. The zip package under each version also contains the release server program package of the trained model.

Language Model Language Model

Maximum entropy hidden Markov model based on probability graph

The input is a Chinese pinyin sequence, and the output is the corresponding Chinese character text.

About Accuracy About Accuracy

Currently, the best model can basically achieve 85% Chinese Pinyin accuracy on the test set.

Python dependent libraries

tensorFlow (2.5-2.11+)
numpy
wave
matplotlib
scipy
requests
flask
waitress
grpcio/grpcio-tools/protobuf

If you don’t know how to install the environment, please run the following command directly (provided you have a GPU and Python3.9, CUDA 11.2 and cudnn 8.1 have been installed):

$ pip install -r requirements.txt

Depends on environment and performance configuration requirements

Data Sets Data Sets

For complete content, please view: Several latest free and open source Chinese speech data sets

Dataset	duration	size	Domestic download	Download abroad
THCHS30	40h	6.01G	data_thchs30.tgz	data_thchs30.tgz
ST-CMDS	100h	7.67G	ST-CMDS-20170001_1-OS.tar.gz	ST-CMDS-20170001_1-OS.tar.gz
AIShell-1	178h	14.51G	data_aishell.tgz	data_aishell.tgz
Primewords	100h	8.44G	primewords_md_2018_set1.tar.gz	primewords_md_2018_set1.tar.gz
MagicData	755h	52G/1.0G/2.2G	train_set.tar.gz/dev_set.tar.gz/test_set.tar.gz	train_set.tar.gz/dev_set.tar.gz/test_set.tar.gz

Note: AISHELL-1 data set decompression method

 $ tar xzf data_aishell.tgz
$ cd data_aishell/wav
$ for tar in *.tar.gz;  do tar xvf $tar; done

Special thanks! Thanks to the predecessors for the public speech data set

If the provided dataset link cannot be opened and downloaded, please click on the link OpenSLR

ASRT speech recognition API client calls SDK

ASRT provides SDK access capabilities for different platforms and programming languages for clients to call and develop speech recognition functions through RPC. For other platforms, speech recognition functions can be accessed directly by calling the general RESTful Open API. Please see the ASRT project documentation for specific access steps.

Client platform	Project repository link
Windows Client SDK and Demo	ASRT_SDK_WinClient
Cross-platform Python3 client SDK and Demo	ASRT_SDK_Python3
Cross-platform Golang client SDK and Demo	asrt-sdk-go
Java Client SDK and Demo	ASRT_SDK_Java

ASRT related information

View the ASRT project wiki documentation

Please check this article for the principles of ASRT:

ASRT: a Chinese speech recognition system

Please see the ASRT training and deployment tutorial:

Teach you how to use ASRT to train a Chinese speech recognition model
Teach you how to use ASRT to deploy Chinese speech recognition API server

For frequently asked questions about the principles of statistical language models, please see:

Statistical Language Model: From Chinese Pinyin to Text
Statistical N-gram language model generation algorithm: simple Chinese word frequency statistics

For questions about CTC, please see: