TY - GEN
T1 - Binaural speech enhancement based on DNN for the application of virtual reality
AU - Wang, Jin
AU - Wang, Jing
AU - Liu, Ming
AU - Yan, Zhaoyu
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/2/2
Y1 - 2019/2/2
N2 - Binaural sound can increase the immersion in virtual reality scenes due to the sense of direction, but when recorded in real-world, it may be corrupted by noise. Some of the existing binaural speech enhancement or separation methods can only provide the single-channel output, which will lead to the loss of the sense of direction. Some methods can provide the dual-channel output, however, such methods will suffer performance loss when the binaural clean speeches and the binaural noise are in the same direction. In this paper, we propose a binaural speech enhancement method based on deep neural network, aiming at dealing with the situation that binaural clean speeches and binaural noises are in the same direction. By mapping the features of the binaural noisy speeches to the labels of the binaural clean speeches, the dual-channel output can be obtained. Besides, batch normalization layer is introduced to further improve the performance. Compared with the baseline methods, the proposed method can obtain better speech quality and intelligibility, and the sense of the direction of the estimated binaural speeches can also be better preserved.
AB - Binaural sound can increase the immersion in virtual reality scenes due to the sense of direction, but when recorded in real-world, it may be corrupted by noise. Some of the existing binaural speech enhancement or separation methods can only provide the single-channel output, which will lead to the loss of the sense of direction. Some methods can provide the dual-channel output, however, such methods will suffer performance loss when the binaural clean speeches and the binaural noise are in the same direction. In this paper, we propose a binaural speech enhancement method based on deep neural network, aiming at dealing with the situation that binaural clean speeches and binaural noises are in the same direction. By mapping the features of the binaural noisy speeches to the labels of the binaural clean speeches, the dual-channel output can be obtained. Besides, batch normalization layer is introduced to further improve the performance. Compared with the baseline methods, the proposed method can obtain better speech quality and intelligibility, and the sense of the direction of the estimated binaural speeches can also be better preserved.
KW - Binaural speech enhancement
KW - Deep neural network
KW - Log-power spectra
KW - Virtual reality
UR - https://www.scopus.com/pages/publications/85063286175
U2 - 10.1109/ICSP.2018.8652275
DO - 10.1109/ICSP.2018.8652275
M3 - Conference contribution
AN - SCOPUS:85063286175
T3 - International Conference on Signal Processing Proceedings, ICSP
SP - 629
EP - 633
BT - 2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018
A2 - Baozong, Yuan
A2 - Qiuqi, Ruan
A2 - Yao, Zhao
A2 - Gaoyun, An
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Signal Processing, ICSP 2018
Y2 - 12 August 2018 through 16 August 2018
ER -