Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

Yi Chang; Zhao Ren; Thanh Tam Nguyen; Kun Qian; Bjorn W. Schuller

doi:10.1109/ICASSP49357.2023.10096757

Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Bjorn W. Schuller

医学技术学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

3 引用（Scopus）

摘要

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

源语言	英语
期刊	Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
DOI	https://doi.org/10.1109/ICASSP49357.2023.10096757
出版状态	已出版 - 2023
活动	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, 希腊期限: 4 6月 2023 → 10 6月 2023

访问文件

10.1109/ICASSP49357.2023.10096757

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7896fac80ea246da94a906344c7101b8,

title = "Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning",

abstract = "Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.",

keywords = "Speech emotion recognition, edge device, lightweight deep learning, neural structured learning",

author = "Yi Chang and Zhao Ren and Nguyen, {Thanh Tam} and Kun Qian and Schuller, {Bjorn W.}",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

doi = "10.1109/ICASSP49357.2023.10096757",

language = "English",

journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",

issn = "0736-7791",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

AU - Chang, Yi

AU - Ren, Zhao

AU - Nguyen, Thanh Tam

AU - Qian, Kun

AU - Schuller, Bjorn W.

PY - 2023

Y1 - 2023

N2 - Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

AB - Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

KW - Speech emotion recognition

KW - edge device

KW - lightweight deep learning

KW - neural structured learning

UR - http://www.scopus.com/inward/record.url?scp=85171249470&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10096757

DO - 10.1109/ICASSP49357.2023.10096757

M3 - Conference article

AN - SCOPUS:85171249470

SN - 0736-7791

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

Knowledge Transfer for on-Device Speech Emotion Recognition With Neural Structured Learning

摘要

访问文件

其它文件与链接

指纹

引用此