TY - GEN
T1 - An Entanglement-driven Fusion Neural Network for Video Sentiment Analysis
AU - Gkoumas, Dimitris
AU - Li, Qiuchi
AU - Yu, Yijun
AU - Song, Dawei
N1 - Publisher Copyright:
© 2021 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Video data is multimodal in its nature, where an utterance can involve linguistic, visual and acoustic information. Therefore, a key challenge for video sentiment analysis is how to combine different modalities for sentiment recognition effectively. The latest neural network approaches achieve state-of-the-art performance, but they neglect to a large degree of how humans understand and reason about sentiment states. By contrast, recent advances in quantum probabilistic neural models have achieved comparable performance to the state-of-the-art, yet with better transparency and increased level of interpretability. However, the existing quantum-inspired models treat quantum states as either a classical mixture or as a separable tensor product across modalities, without triggering their interactions in a way that they are correlated or non-separable (i.e., entangled). This means that the current models have not fully exploited the expressive power of quantum probabilities. To fill this gap, we propose a transparent quantum probabilistic neural model. The model induces different modalities to interact in such a way that they may not be separable, encoding crossmodal information in the form of non-classical correlations. Comprehensive evaluation on two benchmarking datasets for video sentiment analysis shows that the model achieves significant performance improvement. We also show that the degree of non-separability between modalities optimizes the post-hoc interpretability.
AB - Video data is multimodal in its nature, where an utterance can involve linguistic, visual and acoustic information. Therefore, a key challenge for video sentiment analysis is how to combine different modalities for sentiment recognition effectively. The latest neural network approaches achieve state-of-the-art performance, but they neglect to a large degree of how humans understand and reason about sentiment states. By contrast, recent advances in quantum probabilistic neural models have achieved comparable performance to the state-of-the-art, yet with better transparency and increased level of interpretability. However, the existing quantum-inspired models treat quantum states as either a classical mixture or as a separable tensor product across modalities, without triggering their interactions in a way that they are correlated or non-separable (i.e., entangled). This means that the current models have not fully exploited the expressive power of quantum probabilities. To fill this gap, we propose a transparent quantum probabilistic neural model. The model induces different modalities to interact in such a way that they may not be separable, encoding crossmodal information in the form of non-classical correlations. Comprehensive evaluation on two benchmarking datasets for video sentiment analysis shows that the model achieves significant performance improvement. We also show that the degree of non-separability between modalities optimizes the post-hoc interpretability.
UR - http://www.scopus.com/inward/record.url?scp=85125502374&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85125502374
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1736
EP - 1742
BT - Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
A2 - Zhou, Zhi-Hua
PB - International Joint Conferences on Artificial Intelligence
T2 - 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
Y2 - 19 August 2021 through 27 August 2021
ER -