Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

Yi Chang; Zhao Ren; Thanh Tam Nguyen; Kun Qian; Bjorn W. Schuller

doi:10.1109/ICASSP49357.2023.10096757

Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Bjorn W. Schuller

School of Medical and Technology

Research output: Contribution to journal › Conference article › peer-review

5 Citations (Scopus)

Abstract

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

Original language	English
Journal	Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
DOIs	https://doi.org/10.1109/ICASSP49357.2023.10096757
Publication status	Published - 2023
Event	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023

Keywords

Speech emotion recognition
edge device
lightweight deep learning
neural structured learning

Access to Document

10.1109/ICASSP49357.2023.10096757

Cite this

Chang, Y., Ren, Z., Nguyen, T. T., Qian, K., & Schuller, B. W. (2023). Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning. Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. https://doi.org/10.1109/ICASSP49357.2023.10096757

@article{7896fac80ea246da94a906344c7101b8,

title = "Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning",

abstract = "Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.",

keywords = "Speech emotion recognition, edge device, lightweight deep learning, neural structured learning",

author = "Yi Chang and Zhao Ren and Nguyen, {Thanh Tam} and Kun Qian and Schuller, {Bjorn W.}",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

doi = "10.1109/ICASSP49357.2023.10096757",

language = "English",

journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",

issn = "0736-7791",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

AU - Chang, Yi

AU - Ren, Zhao

AU - Nguyen, Thanh Tam

AU - Qian, Kun

AU - Schuller, Bjorn W.

PY - 2023

Y1 - 2023

N2 - Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

AB - Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

KW - Speech emotion recognition

KW - edge device

KW - lightweight deep learning

KW - neural structured learning

UR - http://www.scopus.com/inward/record.url?scp=85171249470&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10096757

DO - 10.1109/ICASSP49357.2023.10096757

M3 - Conference article

AN - SCOPUS:85171249470

SN - 0736-7791

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this