Sentimental Visual Captioning using Multimodal Transformer

Xinxiao Wu*, Tong Li

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

8 引用 (Scopus)

摘要

We propose a new task called sentimental visual captioning that generates captions with the inherent sentiment reflected by the input image or video. Compared with the stylized visual captioning task that requires a predefined style independent of the image or video, our new task automatically analyzes the inherent sentiment tendency from the visual content. With this in mind, we propose a multimodal Transformer model namely Senti-Transformer for sentimental visual captioning, which integrates both content and sentiment information from multiple modalities and incorporates prior sentimental knowledge to generate sentimental sentence. Specifically, we extract prior knowledge from sentimental corpus to obtain sentimental textual information and design a multi-head Transformer encoder to encode multimodal features. Then we decompose the attention layer in the middle of Transformer decoder to focus on important features of each modality, and the attended features are integrated through an intra- and inter-modality fusion mechanism for generating sentimental sentences. To effectively train the proposed model using the external sentimental corpus as well as the paired images or videos and factual sentences in existing captioning datasets, we propose a two-stage training strategy that first learns to incorporate sentimental elements into the sentences via a regularization term and then learns to generate fluent and relevant sentences with the inherent sentimental styles via reinforcement learning with a sentimental reward. Extensive experiments on both image and video datasets demonstrate the effectiveness and superiority of our Senti-Transformer on sentimental visual captioning. Source code is available at https://github.com/ezeli/InSentiCap_ext.

源语言英语
页(从-至)1073-1090
页数18
期刊International Journal of Computer Vision
131
4
DOI
出版状态已出版 - 4月 2023

指纹

探究 'Sentimental Visual Captioning using Multimodal Transformer' 的科研主题。它们共同构成独一无二的指纹。

引用此