TY - GEN
T1 - MemCap
T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020
AU - Zhao, Wentian
AU - Wu, Xinxiao
AU - Zhang, Xiaoxun
N1 - Publisher Copyright:
© 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2020
Y1 - 2020
N2 - Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.
AB - Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.
UR - http://www.scopus.com/inward/record.url?scp=85091307052&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85091307052
T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
SP - 12984
EP - 12992
BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PB - AAAI press
Y2 - 7 February 2020 through 12 February 2020
ER -