A deep reinforced training method for location-based image captioning

Lei Zhao; Chunxia Zhang; Xi Zhang; Yating Hu; Zhendong Niu

doi:10.1007/978-3-319-97304-3_67

A deep reinforced training method for location-based image captioning

Lei Zhao, Chunxia Zhang, Xi Zhang, Yating Hu^*, Zhendong Niu

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.

Original language	English
Title of host publication	PRICAI 2018
Subtitle of host publication	Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings
Editors	Byeong-Ho Kang, Xin Geng
Publisher	Springer Verlag
Pages	878-890
Number of pages	13
ISBN (Print)	9783319973036
DOIs	https://doi.org/10.1007/978-3-319-97304-3_67
Publication status	Published - 2018
Event	15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018 - Nanjing, China Duration: 28 Aug 2018 → 31 Aug 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11012 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018
Country/Territory	China
City	Nanjing
Period	28/08/18 → 31/08/18

Keywords

Combined training
Image captioning
Location-based mechanism

Access to Document

10.1007/978-3-319-97304-3_67

Cite this

Zhao, L., Zhang, C., Zhang, X., Hu, Y., & Niu, Z. (2018). A deep reinforced training method for location-based image captioning. In B.-H. Kang, & X. Geng (Eds.), PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings (pp. 878-890). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11012 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-319-97304-3_67

Zhao, Lei ; Zhang, Chunxia ; Zhang, Xi et al. / A deep reinforced training method for location-based image captioning. PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. editor / Byeong-Ho Kang ; Xin Geng. Springer Verlag, 2018. pp. 878-890 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{4284cca55a2c405da70dde9157da36fe,

title = "A deep reinforced training method for location-based image captioning",

abstract = "Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.",

keywords = "Combined training, Image captioning, Location-based mechanism",

author = "Lei Zhao and Chunxia Zhang and Xi Zhang and Yating Hu and Zhendong Niu",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2018.; 15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018 ; Conference date: 28-08-2018 Through 31-08-2018",

year = "2018",

doi = "10.1007/978-3-319-97304-3_67",

language = "English",

isbn = "9783319973036",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "878--890",

editor = "Byeong-Ho Kang and Xin Geng",

booktitle = "PRICAI 2018",

address = "Germany",

}

Zhao, L, Zhang, C, Zhang, X, Hu, Y & Niu, Z 2018, A deep reinforced training method for location-based image captioning. in B-H Kang & X Geng (eds), PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11012 LNAI, Springer Verlag, pp. 878-890, 15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018, Nanjing, China, 28/08/18. https://doi.org/10.1007/978-3-319-97304-3_67

A deep reinforced training method for location-based image captioning. / Zhao, Lei; Zhang, Chunxia; Zhang, Xi et al.
PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. ed. / Byeong-Ho Kang; Xin Geng. Springer Verlag, 2018. p. 878-890 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11012 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A deep reinforced training method for location-based image captioning

AU - Zhao, Lei

AU - Zhang, Chunxia

AU - Zhang, Xi

AU - Hu, Yating

AU - Niu, Zhendong

N1 - Publisher Copyright: © Springer Nature Switzerland AG 2018.

PY - 2018

Y1 - 2018

N2 - Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.

AB - Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.

KW - Combined training

KW - Image captioning

KW - Location-based mechanism

UR - http://www.scopus.com/inward/record.url?scp=85051940323&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-97304-3_67

DO - 10.1007/978-3-319-97304-3_67

M3 - Conference contribution

AN - SCOPUS:85051940323

SN - 9783319973036

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 878

EP - 890

BT - PRICAI 2018

A2 - Kang, Byeong-Ho

A2 - Geng, Xin

PB - Springer Verlag

T2 - 15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018

Y2 - 28 August 2018 through 31 August 2018

ER -

Zhao L, Zhang C, Zhang X, Hu Y, Niu Z. A deep reinforced training method for location-based image captioning. In Kang BH, Geng X, editors, PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. Springer Verlag. 2018. p. 878-890. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-97304-3_67

A deep reinforced training method for location-based image captioning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this