Cross-modal context-gated convolution for multi-modal sentiment analysis

Huanglu Wen; Shaodi You; Ying Fu

doi:10.1016/j.patrec.2021.03.025

Cross-modal context-gated convolution for multi-modal sentiment analysis

Huanglu Wen, Shaodi You, Ying Fu^*

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

12 Citations (Scopus)

Abstract

When inferring sentiments, using verbal clues only is problematic because of the ambiguity. Adding related vocal and visual contexts as complements for verbal clues can be helpful. To infer sentiments from multi-modal temporal sequences, we need to identify both sentiment-related clues and their cross-modal interactions. However, sentiment-related behaviors of different modalities may not occur at the same time. These behaviors and their interactions are also sparse in time, making it hard to infer the correct sentiments. Besides, unaligned sequences from sensors also have varying sampling rates, which amplify the misalignment and sparsity mentioned above. While most previous multi-modal sentiment analysis works only focus on word-aligned sequences, we propose cross-modal context-gated convolution for unaligned sequences. Cross-modal context-gated convolution captures the local cross-modal interactions, dealing with the misalignment while reducing the effect of unrelated information. Cross-modal context-gated convolution introduces the concept of cross-modal context gate, enabling itself to catch useful cross-modal interactions more effectively. Cross-modal context-gated convolution also brings more possibilities to the layer design for multi-modal sequential modeling. Experiments on multi-modal sentiment analysis datasets under both word-aligned and unaligned conditions show the validity of our approach.

Original language	English
Pages (from-to)	252-259
Number of pages	8
Journal	Pattern Recognition Letters
Volume	146
DOIs	https://doi.org/10.1016/j.patrec.2021.03.025
Publication status	Published - Jun 2021

Keywords

Affective behavior
Artificial neural networks
Multi-modal temporal sequences
Pattern recognition

Access to Document

10.1016/j.patrec.2021.03.025

Cite this

@article{952ec5fda75041428530bf7d16eb75ae,

title = "Cross-modal context-gated convolution for multi-modal sentiment analysis",

abstract = "When inferring sentiments, using verbal clues only is problematic because of the ambiguity. Adding related vocal and visual contexts as complements for verbal clues can be helpful. To infer sentiments from multi-modal temporal sequences, we need to identify both sentiment-related clues and their cross-modal interactions. However, sentiment-related behaviors of different modalities may not occur at the same time. These behaviors and their interactions are also sparse in time, making it hard to infer the correct sentiments. Besides, unaligned sequences from sensors also have varying sampling rates, which amplify the misalignment and sparsity mentioned above. While most previous multi-modal sentiment analysis works only focus on word-aligned sequences, we propose cross-modal context-gated convolution for unaligned sequences. Cross-modal context-gated convolution captures the local cross-modal interactions, dealing with the misalignment while reducing the effect of unrelated information. Cross-modal context-gated convolution introduces the concept of cross-modal context gate, enabling itself to catch useful cross-modal interactions more effectively. Cross-modal context-gated convolution also brings more possibilities to the layer design for multi-modal sequential modeling. Experiments on multi-modal sentiment analysis datasets under both word-aligned and unaligned conditions show the validity of our approach.",

keywords = "Affective behavior, Artificial neural networks, Multi-modal temporal sequences, Pattern recognition",

author = "Huanglu Wen and Shaodi You and Ying Fu",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = jun,

doi = "10.1016/j.patrec.2021.03.025",

language = "English",

volume = "146",

pages = "252--259",

journal = "Pattern Recognition Letters",

issn = "0167-8655",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Cross-modal context-gated convolution for multi-modal sentiment analysis

AU - Wen, Huanglu

AU - You, Shaodi

AU - Fu, Ying

PY - 2021/6

Y1 - 2021/6

N2 - When inferring sentiments, using verbal clues only is problematic because of the ambiguity. Adding related vocal and visual contexts as complements for verbal clues can be helpful. To infer sentiments from multi-modal temporal sequences, we need to identify both sentiment-related clues and their cross-modal interactions. However, sentiment-related behaviors of different modalities may not occur at the same time. These behaviors and their interactions are also sparse in time, making it hard to infer the correct sentiments. Besides, unaligned sequences from sensors also have varying sampling rates, which amplify the misalignment and sparsity mentioned above. While most previous multi-modal sentiment analysis works only focus on word-aligned sequences, we propose cross-modal context-gated convolution for unaligned sequences. Cross-modal context-gated convolution captures the local cross-modal interactions, dealing with the misalignment while reducing the effect of unrelated information. Cross-modal context-gated convolution introduces the concept of cross-modal context gate, enabling itself to catch useful cross-modal interactions more effectively. Cross-modal context-gated convolution also brings more possibilities to the layer design for multi-modal sequential modeling. Experiments on multi-modal sentiment analysis datasets under both word-aligned and unaligned conditions show the validity of our approach.

AB - When inferring sentiments, using verbal clues only is problematic because of the ambiguity. Adding related vocal and visual contexts as complements for verbal clues can be helpful. To infer sentiments from multi-modal temporal sequences, we need to identify both sentiment-related clues and their cross-modal interactions. However, sentiment-related behaviors of different modalities may not occur at the same time. These behaviors and their interactions are also sparse in time, making it hard to infer the correct sentiments. Besides, unaligned sequences from sensors also have varying sampling rates, which amplify the misalignment and sparsity mentioned above. While most previous multi-modal sentiment analysis works only focus on word-aligned sequences, we propose cross-modal context-gated convolution for unaligned sequences. Cross-modal context-gated convolution captures the local cross-modal interactions, dealing with the misalignment while reducing the effect of unrelated information. Cross-modal context-gated convolution introduces the concept of cross-modal context gate, enabling itself to catch useful cross-modal interactions more effectively. Cross-modal context-gated convolution also brings more possibilities to the layer design for multi-modal sequential modeling. Experiments on multi-modal sentiment analysis datasets under both word-aligned and unaligned conditions show the validity of our approach.

KW - Affective behavior

KW - Artificial neural networks

KW - Multi-modal temporal sequences

KW - Pattern recognition

UR - http://www.scopus.com/inward/record.url?scp=85103944828&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2021.03.025

DO - 10.1016/j.patrec.2021.03.025

M3 - Article

AN - SCOPUS:85103944828

SN - 0167-8655

VL - 146

SP - 252

EP - 259

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

ER -

Cross-modal context-gated convolution for multi-modal sentiment analysis

Abstract

Keywords

Access to Document

Other files and links

Cite this