Abstract
Audio–visual emotion recognition is a challenging problem in the research fields of human–computer interaction and pattern recognition. Seeking a common subspace among the heterogeneous multi-modal data is essential for audio–visual emotion recognition. In this paper, we study the subspace learning for audio–visual emotion recognition by combing the similarity of intra-modality and the correlation of inter-modality. First, we enforce a low-rank constraint on the self-representation of the features in the subspace to exploit the structural similarity of intra-modality. It is based on a key observation that each modality and the corresponding features usually lie in a low-dimensional manifold. Second, we propose a joint low-rank model on the representation of inter-modality to keep consistency across different modalities. Finally, the intra-modality similarity and inter-modality correlation are integrated within a unified framework, for which we develop an efficient computational algorithm to pursue the common subspace. Experimental results on three typical audio–visual emotion datasets demonstrate the superior performance of our method on audio–visual emotion recognition.
Original language | English |
---|---|
Pages (from-to) | 324-333 |
Number of pages | 10 |
Journal | Neurocomputing |
Volume | 388 |
DOIs | |
Publication status | Published - 7 May 2020 |
Keywords
- Audio–visual emotion recognition
- Common subspace learning
- Low-rank representation
- Multi-view learning