TY - JOUR
T1 - Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
AU - Chang, Yi
AU - Ren, Zhao
AU - Zhao, Zhonghao
AU - Nguyen, Thanh Tam
AU - Qian, Kun
AU - Schultz, Tanja
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2025 International Speech Communication Association. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
AB - Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
KW - computational paralinguistics
KW - human-computer interaction
KW - speech recognition
UR - https://www.scopus.com/pages/publications/105020074968
U2 - 10.21437/Interspeech.2025-2778
DO - 10.21437/Interspeech.2025-2778
M3 - Conference article
AN - SCOPUS:105020074968
SN - 2308-457X
SP - 141
EP - 145
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 26th Interspeech Conference 2025
Y2 - 17 August 2025 through 21 August 2025
ER -