TY - GEN
T1 - An End-to-End Binaural Sound Localization Model Based on the Equalization and Cancellation Theory
AU - Song, Tao
AU - Zhang, Wenwen
AU - Chen, Jing
N1 - Publisher Copyright:
© (2022) by the Audio Engineering Society All rights reserved.
PY - 2022
Y1 - 2022
N2 - The end-to-end framework has been introduced into the binaural localization modeling and achieved higher localization accuracy than the other models, however, the reasonability and interpretability for applying the related neural networks remain unclear. It has been well documented that the auditory system relies on binaural cues for sound localization, and the equalization and cancellation (EC) theory describes how the binaural cues are extracted. In this paper, an end-to-end binaural localization model is proposed based on the EC theory. In the proposed model, a convolution neural network(CNN) with a specifically designed activation function is used to implement the EC theory. The proposed model was trained in synthesized rooms and evaluated in real rooms. Experiment results show that CNN kernels learned by the proposed model are corresponding to binaural cues, and the proposed model outperforms the current end-to-end model by a 10.73% improvement in localization accuracy and a 12.91% improvement in root mean square error(RMSE).
AB - The end-to-end framework has been introduced into the binaural localization modeling and achieved higher localization accuracy than the other models, however, the reasonability and interpretability for applying the related neural networks remain unclear. It has been well documented that the auditory system relies on binaural cues for sound localization, and the equalization and cancellation (EC) theory describes how the binaural cues are extracted. In this paper, an end-to-end binaural localization model is proposed based on the EC theory. In the proposed model, a convolution neural network(CNN) with a specifically designed activation function is used to implement the EC theory. The proposed model was trained in synthesized rooms and evaluated in real rooms. Experiment results show that CNN kernels learned by the proposed model are corresponding to binaural cues, and the proposed model outperforms the current end-to-end model by a 10.73% improvement in localization accuracy and a 12.91% improvement in root mean square error(RMSE).
UR - http://www.scopus.com/inward/record.url?scp=85136321299&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85136321299
T3 - AES Europe Spring 2022 - 152nd Audio Engineering Society Convention 2022
SP - 275
EP - 283
BT - AES Europe Spring 2022 - 152nd Audio Engineering Society Convention 2022
PB - Audio Engineering Society
T2 - AES Europe Spring 2022 - 152nd Audio Engineering Society Convention 2022
Y2 - 16 May 2022 through 19 May 2022
ER -