Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning

Huaizheng Zhang, Yong Luo, Qiming Ai, Yonggang Wen, Han Hu*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

11 引用 (Scopus)

摘要

Given the massive market of advertising and the sharply increasing online multimedia content (such as videos), it is now fashionable to promote advertisements (ads) together with the multimedia content. However, manually finding relevant ads to match the provided content is labor-intensive, and hence some automatic advertising techniques are developed. Since ads are usually hard to understand only according to its visual appearance due to the contained visual metaphor, some other modalities, such as the contained texts, should be exploited for understanding. To further improve user experience, it is necessary to understand both the ads' topic and sentiment. This motivates us to develop a novel deep multimodal multitask framework that integrates multiple modalities to achieve effective topic and sentiment prediction simultaneously for ads understanding. In particular, in our framework termed Deep$M^2$Ad, we first extract multimodal information from ads and learn high-level and comparable representations. The visual metaphor of the ad is decoded in an unsupervised manner. The obtained representations are then fed into the proposed hierarchical multimodal attention modules to learn task-specific representations for final prediction. A multitask loss function is also designed to jointly train both the topic and sentiment prediction models in an end-to-end manner, where bottom-layer parameters are shared to alleviate over-fitting. We conduct extensive experiments on a large-scale advertisement dataset and achieve state-of-the-art performance for both prediction tasks. The obtained results could be utilized as a benchmark for ads understanding.

源语言英语
主期刊名MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
430-438
页数9
ISBN(电子版)9781450379885
DOI
出版状态已出版 - 12 10月 2020
活动28th ACM International Conference on Multimedia, MM 2020 - Virtual, Online, 美国
期限: 12 10月 202016 10月 2020

出版系列

姓名MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia

会议

会议28th ACM International Conference on Multimedia, MM 2020
国家/地区美国
Virtual, Online
时期12/10/2016/10/20

指纹

探究 'Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此