Image Semantic Feature Multiple Interactive Network for Remote Sensing Image Captioning

Junzhu Hou, Wei Li, Yang Li, Qiaoyi Li, Qiyuan Cheng, Zhengjie Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Remote sensing image captioning is widely used in disaster warning, disaster rescue, geographic positioning and other fields because it input remote sensing images and output accurate, comprehensive and fluent texts. Traditional remote sensing image captioning usually use convolutional neural network as the encoder to extract image features, and recurrent neural network as the decoder to generate texts. However, the image features extracted by the CNN encoder lack semantic information directly corresponding to the texts, and the RNN decoder cannot make full use of the features extracted by the encoder, resulting in the generated texts are not accurate and rich enough. To address the above two problems, we propose image semantic feature multiple interactive network based on the Encoder-Decoder model. We use pre-trained image encoder of CLIP as our remote sensing image semantic feature extraction network to narrow the modal gap between input images and output texts by extracting features that are highly sensitive to image semantic information. The multiple interactive network is used as our decoder. In order to prevent feature redundancy, we use the gated recurrent unit network to the multiple interactive network to fully interact and utilize the features. Experimental results show that our proposed network can generate richer, accurate and comprehensive texts compared with other comparison methods.

Original languageEnglish
Title of host publicationProceedings of 2024 Chinese Intelligent Systems Conference
EditorsYingmin Jia, Weicun Zhang, Yongling Fu, Huihua Yang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages63-74
Number of pages12
ISBN (Print)9789819786572
DOIs
Publication statusPublished - 2024
Event20th Chinese Intelligent Systems Conference, CISC 2024 - Guilin, China
Duration: 26 Oct 202427 Oct 2024

Publication series

NameLecture Notes in Electrical Engineering
Volume1285 LNEE
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference20th Chinese Intelligent Systems Conference, CISC 2024
Country/TerritoryChina
CityGuilin
Period26/10/2427/10/24

Keywords

  • Pre-trained model
  • Remote sensing image captioning
  • Transformer

Fingerprint

Dive into the research topics of 'Image Semantic Feature Multiple Interactive Network for Remote Sensing Image Captioning'. Together they form a unique fingerprint.

Cite this