TY - GEN
T1 - A Chinese Speech Recognition System Based on Binary Neural Network and Pre-processing
AU - Guo, Lunyi
AU - Deng, Yijie
AU - Tang, Liang
AU - Fan, Ronggeng
AU - Yan, Bo
AU - Xiao, Zhuoling
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Neural networks have made excellent progress in the field of speech recognition. However, more research needs to be done in some scenarios where computational resources are limited or real-Time, and low power consumption is required. In this paper, we propose a lightweight speech recognition model based on pre-processing + binary neural network, which can significantly reduce the number of weight parameters while ensuring an acceptable error rate. The speech pre-processing part converts the 1D speech signal to the 2D Mel spectrum and uses Voice Activate Detection (VAD) to make the speech Mel spectrum input variable. The speech data set is also expanded using data augmentation methods. For convolutional layers, the weights are binarized to reduce the number of model parameters and improve computational and storage efficiency. The number of model parameters after quantization is 6.94% of the number of full precision model parameters, and the error rate on the ST CMD speech dataset increases by only 2.07%. Finally, a circuit structure based on binary weights for convolutional computation is designed, and a single multiplication can be implemented using only the hardware resources of the 7 Look Up Table (LUT).
AB - Neural networks have made excellent progress in the field of speech recognition. However, more research needs to be done in some scenarios where computational resources are limited or real-Time, and low power consumption is required. In this paper, we propose a lightweight speech recognition model based on pre-processing + binary neural network, which can significantly reduce the number of weight parameters while ensuring an acceptable error rate. The speech pre-processing part converts the 1D speech signal to the 2D Mel spectrum and uses Voice Activate Detection (VAD) to make the speech Mel spectrum input variable. The speech data set is also expanded using data augmentation methods. For convolutional layers, the weights are binarized to reduce the number of model parameters and improve computational and storage efficiency. The number of model parameters after quantization is 6.94% of the number of full precision model parameters, and the error rate on the ST CMD speech dataset increases by only 2.07%. Finally, a circuit structure based on binary weights for convolutional computation is designed, and a single multiplication can be implemented using only the hardware resources of the 7 Look Up Table (LUT).
KW - and edge compute
KW - binary weights neural network
KW - speech recognize
KW - voice activate detection
UR - http://www.scopus.com/inward/record.url?scp=85149786082&partnerID=8YFLogxK
U2 - 10.1109/WCCCT56755.2023.10052123
DO - 10.1109/WCCCT56755.2023.10052123
M3 - Conference contribution
AN - SCOPUS:85149786082
T3 - 2023 6th World Conference on Computing and Communication Technologies, WCCCT 2023
SP - 129
EP - 134
BT - 2023 6th World Conference on Computing and Communication Technologies, WCCCT 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th World Conference on Computing and Communication Technologies, WCCCT 2023
Y2 - 6 January 2023 through 8 January 2023
ER -