M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition

Yazhou Zhang; Ao Jia; Bo Wang; Peng Zhang; Dongming Zhao; Pu Li; Yuexian Hou; Xiaojia Jin; Dawei Song; Jing Qin

doi:10.1145/3593583

M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition

Yazhou Zhang, Ao Jia, Bo Wang, Peng Zhang, Dongming Zhao, Pu Li, Yuexian Hou, Xiaojia Jin, Dawei Song^*, Jing Qin^*

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

26 引用（Scopus）

摘要

Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.

源语言	英语
文章编号	13
期刊	ACM Transactions on Information Systems
卷	42
期	1
DOI	https://doi.org/10.1145/3593583
出版状态	已出版 - 21 8月 2023

访问文件

10.1145/3593583

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, Y., Jia, A., Wang, B., Zhang, P., Zhao, D., Li, P., Hou, Y., Jin, X., Song, D., & Qin, J. (2023). M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition. ACM Transactions on Information Systems, 42(1), 文章 13. https://doi.org/10.1145/3593583

@article{3cae33912d8d4b74ac6bc2d7c637ea0d,

title = "M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition",

abstract = "Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.",

keywords = "Multi-modal sentiment analysis, emotion recognition, graph neural network, multi-task learning",

author = "Yazhou Zhang and Ao Jia and Bo Wang and Peng Zhang and Dongming Zhao and Pu Li and Yuexian Hou and Xiaojia Jin and Dawei Song and Jing Qin",

note = "Publisher Copyright: {\textcopyright} 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.",

year = "2023",

month = aug,

day = "21",

doi = "10.1145/3593583",

language = "English",

volume = "42",

journal = "ACM Transactions on Information Systems",

issn = "1046-8188",

publisher = "Association for Computing Machinery (ACM)",

number = "1",

}

TY - JOUR

T1 - M3GAT

T2 - A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition

AU - Zhang, Yazhou

AU - Jia, Ao

AU - Wang, Bo

AU - Zhang, Peng

AU - Zhao, Dongming

AU - Li, Pu

AU - Hou, Yuexian

AU - Jin, Xiaojia

AU - Song, Dawei

AU - Qin, Jing

PY - 2023/8/21

Y1 - 2023/8/21

N2 - Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.

AB - Sentiment and emotion, which correspond to long-term and short-lived human feelings, are closely linked to each other, leading to the fact that sentiment analysis and emotion recognition are also two interdependent tasks in natural language processing (NLP). One task often leverages the shared knowledge from another task and performs better when solved in a joint learning paradigm. Conversational context dependency, multi-modal interaction, and multi-task correlation are three key factors that contribute to this joint paradigm. However, none of the recent approaches have considered them in a unified framework. To fill this gap, we propose a multi-modal, multi-task interactive graph attention network, termed M3GAT, to simultaneously solve the three problems. At the heart of the model is a proposed interactive conversation graph layer containing three core sub-modules, which are: (1) local-global context connection for modeling both local and global conversational context, (2) cross-modal connection for learning multi-modal complementary and (3) cross-task connection for capturing the correlation across two tasks. Comprehensive experiments on three benchmarking datasets, MELD, MEISD, and MSED, show the effectiveness of M3GAT over state-of-the-art baselines with the margin of 1.88%, 5.37%, and 0.19% for sentiment analysis, and 1.99%, 3.65%, and 0.13% for emotion recognition, respectively. In addition, we also show the superiority of multi-task learning over the single-task framework.

KW - Multi-modal sentiment analysis

KW - emotion recognition

KW - graph neural network

KW - multi-task learning

UR - http://www.scopus.com/inward/record.url?scp=85175605040&partnerID=8YFLogxK

U2 - 10.1145/3593583

DO - 10.1145/3593583

M3 - Article

AN - SCOPUS:85175605040

SN - 1046-8188

VL - 42

JO - ACM Transactions on Information Systems

JF - ACM Transactions on Information Systems

IS - 1

M1 - 13

ER -

M3GAT: A Multi-modal, Multi-task Interactive Graph Attention Network for Conversational Sentiment Analysis and Emotion Recognition

摘要

访问文件

其它文件与链接

指纹

引用此