Joint low rank embedded multiple features learning for audio–visual emotion recognition

Zhan Wang, Lizhi Wang*, Hua Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)

Abstract

Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.

Original languageEnglish
Pages (from-to)324-333
Number of pages10
JournalNeurocomputing
Volume388
DOIs
Publication statusPublished - 7 May 2020

Keywords

  • Audio–visual emotion recognition
  • Common subspace learning
  • Low-rank representation
  • Multi-view learning

Fingerprint

Dive into the research topics of 'Joint low rank embedded multiple features learning for audio–visual emotion recognition'. Together they form a unique fingerprint.

Cite this