A deep reinforced training method for location-based image captioning

Lei Zhao; Chunxia Zhang; Xi Zhang; Yating Hu; Zhendong Niu

doi:10.1007/978-3-319-97304-3_67

A deep reinforced training method for location-based image captioning

Lei Zhao, Chunxia Zhang, Xi Zhang, Yating Hu^*, Zhendong Niu

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.

源语言	英语
主期刊名	PRICAI 2018
主期刊副标题	Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings
编辑	Byeong-Ho Kang, Xin Geng
出版商	Springer Verlag
页	878-890
页数	13
ISBN（印刷版）	9783319973036
DOI	https://doi.org/10.1007/978-3-319-97304-3_67
出版状态	已出版 - 2018
活动	15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018 - Nanjing, 中国期限: 28 8月 2018 → 31 8月 2018

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	11012 LNAI
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018
国家/地区	中国
市	Nanjing
时期	28/08/18 → 31/08/18

访问文件

10.1007/978-3-319-97304-3_67

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhao, L., Zhang, C., Zhang, X., Hu, Y., & Niu, Z. (2018). A deep reinforced training method for location-based image captioning. 在 B.-H. Kang, & X. Geng (编辑), PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings (页码 878-890). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 11012 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-319-97304-3_67

Zhao, Lei ; Zhang, Chunxia ; Zhang, Xi 等. / A deep reinforced training method for location-based image captioning. PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. 编辑 / Byeong-Ho Kang ; Xin Geng. Springer Verlag, 2018. 页码 878-890 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{4284cca55a2c405da70dde9157da36fe,

title = "A deep reinforced training method for location-based image captioning",

abstract = "Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.",

keywords = "Combined training, Image captioning, Location-based mechanism",

author = "Lei Zhao and Chunxia Zhang and Xi Zhang and Yating Hu and Zhendong Niu",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2018.; 15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018 ; Conference date: 28-08-2018 Through 31-08-2018",

year = "2018",

doi = "10.1007/978-3-319-97304-3_67",

language = "English",

isbn = "9783319973036",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "878--890",

editor = "Byeong-Ho Kang and Xin Geng",

booktitle = "PRICAI 2018",

address = "Germany",

}

Zhao, L, Zhang, C, Zhang, X, Hu, Y & Niu, Z 2018, A deep reinforced training method for location-based image captioning. 在 B-H Kang & X Geng (编辑), PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 11012 LNAI, Springer Verlag, 页码 878-890, 15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018, Nanjing, 中国, 28/08/18. https://doi.org/10.1007/978-3-319-97304-3_67

A deep reinforced training method for location-based image captioning. / Zhao, Lei; Zhang, Chunxia; Zhang, Xi 等.
PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. 编辑 / Byeong-Ho Kang; Xin Geng. Springer Verlag, 2018. 页码 878-890 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 11012 LNAI).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - A deep reinforced training method for location-based image captioning

AU - Zhao, Lei

AU - Zhang, Chunxia

AU - Zhang, Xi

AU - Hu, Yating

AU - Niu, Zhendong

N1 - Publisher Copyright: © Springer Nature Switzerland AG 2018.

PY - 2018

Y1 - 2018

N2 - Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.

AB - Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.

KW - Combined training

KW - Image captioning

KW - Location-based mechanism

UR - http://www.scopus.com/inward/record.url?scp=85051940323&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-97304-3_67

DO - 10.1007/978-3-319-97304-3_67

M3 - Conference contribution

AN - SCOPUS:85051940323

SN - 9783319973036

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 878

EP - 890

BT - PRICAI 2018

A2 - Kang, Byeong-Ho

A2 - Geng, Xin

PB - Springer Verlag

T2 - 15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018

Y2 - 28 August 2018 through 31 August 2018

ER -

Zhao L, Zhang C, Zhang X, Hu Y, Niu Z. A deep reinforced training method for location-based image captioning. 在 Kang BH, Geng X, 编辑, PRICAI 2018: Trends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings. Springer Verlag. 2018. 页码 878-890. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-97304-3_67

A deep reinforced training method for location-based image captioning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此