TY - GEN
T1 - ANA-Mix
T2 - 3rd IEEE International Conference on Sensors, Electronics and Computer Engineering, ICSECE 2025
AU - Wang, Xiaoliang
AU - Wang, Yu
AU - Liu, Ye
AU - Zhou, Xudong
AU - Liu, Fengming
AU - Yu, Fengge
AU - Zhang, Shuai
AU - Li, Guozheng
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper presents the Airport Noise-AISHELL Mix (ANA-Mix), a rich and realistic dataset tailored for advancing speech recognition and interactive systems in complex airport acoustic conditions. The noisy speech dataset is constructed by combining the publicly available AISHELL-3 Mandarin speech dataset with the environmental noise data actually collected at airports. The AISHELL-3 dataset provides a rich variety of high-quality sentence recordings, while the airport noise data captures a variety of typical airport noise scenarios, including crowd conversations, luggage rolling, and boarding announcements. A data mixing method is used to superimpose clean speech and randomly selected airport noise in waveforms to create 200,000 sets of noisy speech samples, including approximately 100,000 sets of single-person noisy speech and another 100,000 sets of multi-person (2~4 speakers) speech. This voice construction results are close to the actual deployment environment. The dataset constructed in this study can be used for a variety of tasks such as speech recognition, voiceprint recognition, and speech enhancement, demonstrating its potential value in improving the performance of voice interaction systems.
AB - This paper presents the Airport Noise-AISHELL Mix (ANA-Mix), a rich and realistic dataset tailored for advancing speech recognition and interactive systems in complex airport acoustic conditions. The noisy speech dataset is constructed by combining the publicly available AISHELL-3 Mandarin speech dataset with the environmental noise data actually collected at airports. The AISHELL-3 dataset provides a rich variety of high-quality sentence recordings, while the airport noise data captures a variety of typical airport noise scenarios, including crowd conversations, luggage rolling, and boarding announcements. A data mixing method is used to superimpose clean speech and randomly selected airport noise in waveforms to create 200,000 sets of noisy speech samples, including approximately 100,000 sets of single-person noisy speech and another 100,000 sets of multi-person (2~4 speakers) speech. This voice construction results are close to the actual deployment environment. The dataset constructed in this study can be used for a variety of tasks such as speech recognition, voiceprint recognition, and speech enhancement, demonstrating its potential value in improving the performance of voice interaction systems.
KW - airport noise
KW - AISHELL-3
KW - speech enhancement
KW - speech recognition
UR - https://www.scopus.com/pages/publications/105030441537
U2 - 10.1109/ICSECE65727.2025.11257069
DO - 10.1109/ICSECE65727.2025.11257069
M3 - Conference contribution
AN - SCOPUS:105030441537
T3 - 2025 IEEE 3rd International Conference on Sensors, Electronics and Computer Engineering, ICSECE 2025
SP - 98
EP - 102
BT - 2025 IEEE 3rd International Conference on Sensors, Electronics and Computer Engineering, ICSECE 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 29 August 2025 through 31 August 2025
ER -