Recurrent attention LSTM model for image chinese caption generation

Chaoying Zhang; Yaping Dai; Yanyan Cheng; Zhiyang Jia; Kaoru Hirota

doi:10.1109/SCIS-ISIS.2018.00134

Recurrent attention LSTM model for image chinese caption generation

Chaoying Zhang, Yaping Dai, Yanyan Cheng, Zhiyang Jia, Kaoru Hirota

School of Automation

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Citations (Scopus)

Abstract

A Recurrent Attention LSTM model (RAL) is proposed for image Chinese caption generation. The model uses Inception-v4 as CNN model developed by Google to extract image features while the recurrent attention LSTM mechanism determines feature weights. The model can generate words accurately because of adding the weights of image region. Therefore, the proposed model is able to generate more relevant descriptions and improve the efficiency of the system. Compared with Neural Image Caption (NIC) model, the experiment results show that the performance of the proposed model is improved by 1.8% with BLEU-4 metrics and 6.2% with CIDEr metrics on the AI Challenger Image Chinese Captioning dataset.

Original language	English
Title of host publication	Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	808-813
Number of pages	6
ISBN (Electronic)	9781538626337
DOIs	https://doi.org/10.1109/SCIS-ISIS.2018.00134
Publication status	Published - 2 Jul 2018
Event	Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018 - Toyama, Japan Duration: 5 Dec 2018 → 8 Dec 2018

Publication series

Name	Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018

Conference

Conference	Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018
Country/Territory	Japan
City	Toyama
Period	5/12/18 → 8/12/18

Keywords

Convolutional Neural Network
Image Chinese Caption Generation
Long Short-Term Memory
Recurrent Attention LSTM

Access to Document

10.1109/SCIS-ISIS.2018.00134

Cite this

Zhang, C., Dai, Y., Cheng, Y., Jia, Z., & Hirota, K. (2018). Recurrent attention LSTM model for image chinese caption generation. In Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018 (pp. 808-813). Article 8716249 (Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SCIS-ISIS.2018.00134

Zhang, Chaoying ; Dai, Yaping ; Cheng, Yanyan et al. / Recurrent attention LSTM model for image chinese caption generation. Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 808-813 (Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018).

@inproceedings{a09f51ca167d403588224662d6bd9719,

title = "Recurrent attention LSTM model for image chinese caption generation",

abstract = "A Recurrent Attention LSTM model (RAL) is proposed for image Chinese caption generation. The model uses Inception-v4 as CNN model developed by Google to extract image features while the recurrent attention LSTM mechanism determines feature weights. The model can generate words accurately because of adding the weights of image region. Therefore, the proposed model is able to generate more relevant descriptions and improve the efficiency of the system. Compared with Neural Image Caption (NIC) model, the experiment results show that the performance of the proposed model is improved by 1.8% with BLEU-4 metrics and 6.2% with CIDEr metrics on the AI Challenger Image Chinese Captioning dataset.",

keywords = "Convolutional Neural Network, Image Chinese Caption Generation, Long Short-Term Memory, Recurrent Attention LSTM",

author = "Chaoying Zhang and Yaping Dai and Yanyan Cheng and Zhiyang Jia and Kaoru Hirota",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018 ; Conference date: 05-12-2018 Through 08-12-2018",

year = "2018",

month = jul,

day = "2",

doi = "10.1109/SCIS-ISIS.2018.00134",

language = "English",

series = "Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "808--813",

booktitle = "Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018",

address = "United States",

}

Zhang, C, Dai, Y, Cheng, Y, Jia, Z & Hirota, K 2018, Recurrent attention LSTM model for image chinese caption generation. in Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018., 8716249, Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018, Institute of Electrical and Electronics Engineers Inc., pp. 808-813, Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018, Toyama, Japan, 5/12/18. https://doi.org/10.1109/SCIS-ISIS.2018.00134

Recurrent attention LSTM model for image chinese caption generation. / Zhang, Chaoying; Dai, Yaping; Cheng, Yanyan et al.
Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 808-813 8716249 (Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Recurrent attention LSTM model for image chinese caption generation

AU - Zhang, Chaoying

AU - Dai, Yaping

AU - Cheng, Yanyan

AU - Jia, Zhiyang

AU - Hirota, Kaoru

PY - 2018/7/2

Y1 - 2018/7/2

N2 - A Recurrent Attention LSTM model (RAL) is proposed for image Chinese caption generation. The model uses Inception-v4 as CNN model developed by Google to extract image features while the recurrent attention LSTM mechanism determines feature weights. The model can generate words accurately because of adding the weights of image region. Therefore, the proposed model is able to generate more relevant descriptions and improve the efficiency of the system. Compared with Neural Image Caption (NIC) model, the experiment results show that the performance of the proposed model is improved by 1.8% with BLEU-4 metrics and 6.2% with CIDEr metrics on the AI Challenger Image Chinese Captioning dataset.

AB - A Recurrent Attention LSTM model (RAL) is proposed for image Chinese caption generation. The model uses Inception-v4 as CNN model developed by Google to extract image features while the recurrent attention LSTM mechanism determines feature weights. The model can generate words accurately because of adding the weights of image region. Therefore, the proposed model is able to generate more relevant descriptions and improve the efficiency of the system. Compared with Neural Image Caption (NIC) model, the experiment results show that the performance of the proposed model is improved by 1.8% with BLEU-4 metrics and 6.2% with CIDEr metrics on the AI Challenger Image Chinese Captioning dataset.

KW - Convolutional Neural Network

KW - Image Chinese Caption Generation

KW - Long Short-Term Memory

KW - Recurrent Attention LSTM

UR - http://www.scopus.com/inward/record.url?scp=85067103894&partnerID=8YFLogxK

U2 - 10.1109/SCIS-ISIS.2018.00134

DO - 10.1109/SCIS-ISIS.2018.00134

M3 - Conference contribution

AN - SCOPUS:85067103894

T3 - Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018

SP - 808

EP - 813

BT - Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018

Y2 - 5 December 2018 through 8 December 2018

ER -

Zhang C, Dai Y, Cheng Y, Jia Z , Hirota K. Recurrent attention LSTM model for image chinese caption generation. In Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 808-813. 8716249. (Proceedings - 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems and 19th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2018). doi: 10.1109/SCIS-ISIS.2018.00134

Recurrent attention LSTM model for image chinese caption generation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this