MemCap: Memorizing style knowledge for image captioning

Wentian Zhao; Xinxiao Wu; Xiaoxun Zhang

MemCap: Memorizing style knowledge for image captioning

Wentian Zhao, Xinxiao Wu^*, Xiaoxun Zhang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

70 Citations (Scopus)

Abstract

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.

Original language	English
Title of host publication	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
Publisher	AAAI press
Pages	12984-12992
Number of pages	9
ISBN (Electronic)	9781577358350
Publication status	Published - 2020
Event	34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States Duration: 7 Feb 2020 → 12 Feb 2020

Publication series

Name	AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

Conference

Conference	34th AAAI Conference on Artificial Intelligence, AAAI 2020
Country/Territory	United States
City	New York
Period	7/02/20 → 12/02/20

Cite this

Zhao, W., Wu, X., & Zhang, X. (2020). MemCap: Memorizing style knowledge for image captioning. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 12984-12992). (AAAI 2020 - 34th AAAI Conference on Artificial Intelligence). AAAI press.

@inproceedings{2bda367ca11241b48c6c61c911cd5638,

title = "MemCap: Memorizing style knowledge for image captioning",

abstract = "Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.",

author = "Wentian Zhao and Xinxiao Wu and Xiaoxun Zhang",

note = "Publisher Copyright: {\textcopyright} 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 34th AAAI Conference on Artificial Intelligence, AAAI 2020 ; Conference date: 07-02-2020 Through 12-02-2020",

year = "2020",

language = "English",

series = "AAAI 2020 - 34th AAAI Conference on Artificial Intelligence",

publisher = "AAAI press",

pages = "12984--12992",

booktitle = "AAAI 2020 - 34th AAAI Conference on Artificial Intelligence",

}

TY - GEN

T1 - MemCap

T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020

AU - Zhao, Wentian

AU - Wu, Xinxiao

AU - Zhang, Xiaoxun

PY - 2020

Y1 - 2020

N2 - Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.

AB - Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately. In this paper, we propose MemCap, a novel stylized image captioning method that explicitly encodes the knowledge about linguistic styles with memory mechanism. Rather than relying heavily on a language model to capture style factors in existing methods, our method resorts to memorizing stylized elements learned from training corpus. Particularly, we design a memory module that comprises a set of embedding vectors for encoding style-related phrases in training corpus. To acquire the style-related phrases, we develop a sentence decomposing algorithm that splits a stylized sentence into a style-related part that reflects the linguistic style and a content-related part that contains the visual content. When generating captions, our MemCap first extracts content-relevant style knowledge from the memory module via an attention mechanism and then incorporates the extracted knowledge into a language model. Extensive experiments on two stylized image captioning datasets (SentiCap and FlickrStyle10K) demonstrate the effectiveness of our method.

UR - http://www.scopus.com/inward/record.url?scp=85091307052&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85091307052

T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

SP - 12984

EP - 12992

BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

PB - AAAI press

Y2 - 7 February 2020 through 12 February 2020

ER -

MemCap: Memorizing style knowledge for image captioning

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this