TY - JOUR
T1 - Towards Contrastive Context-Aware Conversational Emotion Recognition
AU - Zhang, Hanqing
AU - Song, Dawei
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Conversational Emotion Recognition (CER) aims at classifying the emotion of each utterance in a conversation. For a target utterance, its emotion is jointly determined by multiple factors, such as conversation topics, emotion labels and intra/inter-speaker influences, in the conversational context of it. Then an important research question arises: can the effects of these contextual factors be sufficiently captured by the current CER models? To answer this question, we carry out an empirical study on four representative CER models by a context-replacement methodology. The results suggest that these models either exhibit a label-copying effect, or rely heavily on the intra/inter-speaker dependency structure within the conversation, but do not make a good use of the semantics carried by the conversational context. Thus, there is a high risk that they overfit certain single factors, yet lacking a holistic understanding of the semantic context. To tackle the problem, we propose a semantic-guided contrastive context-aware CER method, namely C3ER, to augment/regularize a backbone CER model, which can be any neural CER framework. Specifically, C3ER takes the hidden states of utterances from the CER model as input, extracts the contrast pairs consisting of relevant and irrelevant utterances to the conversational context of a target utterance, and uses contrastive learning to establish a soft semantic constraint between the target utterance and its context. It is then jointly trained with the main CER model, forcing the model to gain a semantic understanding of the context. Extensive experimental results show that C3ER can significantly boost the accuracy and improve the robustness of the representative CER models.
AB - Conversational Emotion Recognition (CER) aims at classifying the emotion of each utterance in a conversation. For a target utterance, its emotion is jointly determined by multiple factors, such as conversation topics, emotion labels and intra/inter-speaker influences, in the conversational context of it. Then an important research question arises: can the effects of these contextual factors be sufficiently captured by the current CER models? To answer this question, we carry out an empirical study on four representative CER models by a context-replacement methodology. The results suggest that these models either exhibit a label-copying effect, or rely heavily on the intra/inter-speaker dependency structure within the conversation, but do not make a good use of the semantics carried by the conversational context. Thus, there is a high risk that they overfit certain single factors, yet lacking a holistic understanding of the semantic context. To tackle the problem, we propose a semantic-guided contrastive context-aware CER method, namely C3ER, to augment/regularize a backbone CER model, which can be any neural CER framework. Specifically, C3ER takes the hidden states of utterances from the CER model as input, extracts the contrast pairs consisting of relevant and irrelevant utterances to the conversational context of a target utterance, and uses contrastive learning to establish a soft semantic constraint between the target utterance and its context. It is then jointly trained with the main CER model, forcing the model to gain a semantic understanding of the context. Extensive experimental results show that C3ER can significantly boost the accuracy and improve the robustness of the representative CER models.
KW - Conversational emotion recognition
KW - contrastive learning
KW - conversational context
KW - semantic constraint
UR - http://www.scopus.com/inward/record.url?scp=85139876872&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2022.3212994
DO - 10.1109/TAFFC.2022.3212994
M3 - Article
AN - SCOPUS:85139876872
SN - 1949-3045
VL - 13
SP - 1879
EP - 1891
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 4
ER -