Joint low rank embedded multiple features learning for audio–visual emotion recognition

Zhan Wang; Lizhi Wang; Hua Huang

doi:10.1016/j.neucom.2020.01.017

Joint low rank embedded multiple features learning for audio–visual emotion recognition

Zhan Wang, Lizhi Wang^*, Hua Huang

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

12 引用（Scopus）

摘要

Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.

源语言	英语
页（从-至）	324-333
页数	10
期刊	Neurocomputing
卷	388
DOI	https://doi.org/10.1016/j.neucom.2020.01.017
出版状态	已出版 - 7 5月 2020

访问文件

10.1016/j.neucom.2020.01.017

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5c0c0153d4104dcbbe6a4935b6f10177,

title = "Joint low rank embedded multiple features learning for audio–visual emotion recognition",

abstract = "Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.",

keywords = "Audio–visual emotion recognition, Common subspace learning, Low-rank representation, Multi-view learning",

author = "Zhan Wang and Lizhi Wang and Hua Huang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = may,

day = "7",

doi = "10.1016/j.neucom.2020.01.017",

language = "English",

volume = "388",

pages = "324--333",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Joint low rank embedded multiple features learning for audio–visual emotion recognition

AU - Wang, Zhan

AU - Wang, Lizhi

AU - Huang, Hua

PY - 2020/5/7

Y1 - 2020/5/7

N2 - Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.

AB - Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.

KW - Audio–visual emotion recognition

KW - Common subspace learning

KW - Low-rank representation

KW - Multi-view learning

UR - http://www.scopus.com/inward/record.url?scp=85078473749&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.01.017

DO - 10.1016/j.neucom.2020.01.017

M3 - Article

AN - SCOPUS:85078473749

SN - 0925-2312

VL - 388

SP - 324

EP - 333

JO - Neurocomputing

JF - Neurocomputing

ER -

Joint low rank embedded multiple features learning for audio–visual emotion recognition

摘要

访问文件

其它文件与链接

指纹

引用此