Joint low rank embedded multiple features learning for audio–visual emotion recognition

Zhan Wang; Lizhi Wang; Hua Huang

doi:10.1016/j.neucom.2020.01.017

Joint low rank embedded multiple features learning for audio–visual emotion recognition

Zhan Wang, Lizhi Wang^*, Hua Huang

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

12 Citations (Scopus)

Abstract

Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.

Original language	English
Pages (from-to)	324-333
Number of pages	10
Journal	Neurocomputing
Volume	388
DOIs	https://doi.org/10.1016/j.neucom.2020.01.017
Publication status	Published - 7 May 2020

Keywords

Audio–visual emotion recognition
Common subspace learning
Low-rank representation
Multi-view learning

Access to Document

10.1016/j.neucom.2020.01.017

Cite this

@article{5c0c0153d4104dcbbe6a4935b6f10177,

title = "Joint low rank embedded multiple features learning for audio–visual emotion recognition",

abstract = "Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.",

keywords = "Audio–visual emotion recognition, Common subspace learning, Low-rank representation, Multi-view learning",

author = "Zhan Wang and Lizhi Wang and Hua Huang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = may,

day = "7",

doi = "10.1016/j.neucom.2020.01.017",

language = "English",

volume = "388",

pages = "324--333",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Joint low rank embedded multiple features learning for audio–visual emotion recognition

AU - Wang, Zhan

AU - Wang, Lizhi

AU - Huang, Hua

PY - 2020/5/7

Y1 - 2020/5/7

N2 - Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.

AB - Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.

KW - Audio–visual emotion recognition

KW - Common subspace learning

KW - Low-rank representation

KW - Multi-view learning

UR - http://www.scopus.com/inward/record.url?scp=85078473749&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.01.017

DO - 10.1016/j.neucom.2020.01.017

M3 - Article

AN - SCOPUS:85078473749

SN - 0925-2312

VL - 388

SP - 324

EP - 333

JO - Neurocomputing

JF - Neurocomputing

ER -

Joint low rank embedded multiple features learning for audio–visual emotion recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this