Military Image Captioning for Low-Altitude UAV or UGV Perspectives

Lizhi Pan; Chengtian Song; Xiaozheng Gan; Keyu Xu; Yue Xie

doi:10.3390/drones8090421

Military Image Captioning for Low-Altitude UAV or UGV Perspectives

Lizhi Pan, Chengtian Song^*, Xiaozheng Gan, Keyu Xu, Yue Xie

^*此作品的通讯作者

机电学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Low-altitude unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which boast high-resolution imaging and agile maneuvering capabilities, are widely utilized in military scenarios and generate a vast amount of image data that can be leveraged for textual intelligence generation to support military decision making. Military image captioning (MilitIC), as a visual-language learning task, provides innovative solutions for military image understanding and intelligence generation. However, the scarcity of military image datasets hinders the advancement of MilitIC methods, especially those based on deep learning. To overcome this limitation, we introduce an open-access benchmark dataset, which was termed the Military Objects in Real Combat (MOCO) dataset. It features real combat images captured from the perspective of low-altitude UAVs or UGVs, along with a comprehensive set of captions. Furthermore, we propose a novel encoder–augmentation–decoder image-captioning architecture with a map augmentation embedding (MAE) mechanism, MAE-MilitIC, which leverages both image and text modalities as a guiding prefix for caption generation and bridges the semantic gap between visual and textual data. The MAE mechanism maps both image and text embeddings onto a semantic subspace constructed by relevant military prompts, and augments the military semantics of the image embeddings with attribute-explicit text embeddings. Finally, we demonstrate through extensive experiments that MAE-MilitIC surpasses existing models in performance on two challenging datasets, which provides strong support for intelligence warfare based on military UAVs and UGVs.

源语言	英语
文章编号	421
期刊	Drones
卷	8
期	9
DOI	https://doi.org/10.3390/drones8090421
出版状态	已出版 - 9月 2024

访问文件

10.3390/drones8090421

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{11b670b628b74839a5405cb92cb4cd35,

title = "Military Image Captioning for Low-Altitude UAV or UGV Perspectives",

abstract = "Low-altitude unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which boast high-resolution imaging and agile maneuvering capabilities, are widely utilized in military scenarios and generate a vast amount of image data that can be leveraged for textual intelligence generation to support military decision making. Military image captioning (MilitIC), as a visual-language learning task, provides innovative solutions for military image understanding and intelligence generation. However, the scarcity of military image datasets hinders the advancement of MilitIC methods, especially those based on deep learning. To overcome this limitation, we introduce an open-access benchmark dataset, which was termed the Military Objects in Real Combat (MOCO) dataset. It features real combat images captured from the perspective of low-altitude UAVs or UGVs, along with a comprehensive set of captions. Furthermore, we propose a novel encoder–augmentation–decoder image-captioning architecture with a map augmentation embedding (MAE) mechanism, MAE-MilitIC, which leverages both image and text modalities as a guiding prefix for caption generation and bridges the semantic gap between visual and textual data. The MAE mechanism maps both image and text embeddings onto a semantic subspace constructed by relevant military prompts, and augments the military semantics of the image embeddings with attribute-explicit text embeddings. Finally, we demonstrate through extensive experiments that MAE-MilitIC surpasses existing models in performance on two challenging datasets, which provides strong support for intelligence warfare based on military UAVs and UGVs.",

keywords = "image understanding, military image captioning, unmanned aerial vehicle (UAV), unmanned ground vehicle (UGV), visual-language model",

author = "Lizhi Pan and Chengtian Song and Xiaozheng Gan and Keyu Xu and Yue Xie",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = sep,

doi = "10.3390/drones8090421",

language = "English",

volume = "8",

journal = "Drones",

issn = "2504-446X",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "9",

}

TY - JOUR

T1 - Military Image Captioning for Low-Altitude UAV or UGV Perspectives

AU - Pan, Lizhi

AU - Song, Chengtian

AU - Gan, Xiaozheng

AU - Xu, Keyu

AU - Xie, Yue

PY - 2024/9

Y1 - 2024/9

N2 - Low-altitude unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which boast high-resolution imaging and agile maneuvering capabilities, are widely utilized in military scenarios and generate a vast amount of image data that can be leveraged for textual intelligence generation to support military decision making. Military image captioning (MilitIC), as a visual-language learning task, provides innovative solutions for military image understanding and intelligence generation. However, the scarcity of military image datasets hinders the advancement of MilitIC methods, especially those based on deep learning. To overcome this limitation, we introduce an open-access benchmark dataset, which was termed the Military Objects in Real Combat (MOCO) dataset. It features real combat images captured from the perspective of low-altitude UAVs or UGVs, along with a comprehensive set of captions. Furthermore, we propose a novel encoder–augmentation–decoder image-captioning architecture with a map augmentation embedding (MAE) mechanism, MAE-MilitIC, which leverages both image and text modalities as a guiding prefix for caption generation and bridges the semantic gap between visual and textual data. The MAE mechanism maps both image and text embeddings onto a semantic subspace constructed by relevant military prompts, and augments the military semantics of the image embeddings with attribute-explicit text embeddings. Finally, we demonstrate through extensive experiments that MAE-MilitIC surpasses existing models in performance on two challenging datasets, which provides strong support for intelligence warfare based on military UAVs and UGVs.

AB - Low-altitude unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which boast high-resolution imaging and agile maneuvering capabilities, are widely utilized in military scenarios and generate a vast amount of image data that can be leveraged for textual intelligence generation to support military decision making. Military image captioning (MilitIC), as a visual-language learning task, provides innovative solutions for military image understanding and intelligence generation. However, the scarcity of military image datasets hinders the advancement of MilitIC methods, especially those based on deep learning. To overcome this limitation, we introduce an open-access benchmark dataset, which was termed the Military Objects in Real Combat (MOCO) dataset. It features real combat images captured from the perspective of low-altitude UAVs or UGVs, along with a comprehensive set of captions. Furthermore, we propose a novel encoder–augmentation–decoder image-captioning architecture with a map augmentation embedding (MAE) mechanism, MAE-MilitIC, which leverages both image and text modalities as a guiding prefix for caption generation and bridges the semantic gap between visual and textual data. The MAE mechanism maps both image and text embeddings onto a semantic subspace constructed by relevant military prompts, and augments the military semantics of the image embeddings with attribute-explicit text embeddings. Finally, we demonstrate through extensive experiments that MAE-MilitIC surpasses existing models in performance on two challenging datasets, which provides strong support for intelligence warfare based on military UAVs and UGVs.

KW - image understanding

KW - military image captioning

KW - unmanned aerial vehicle (UAV)

KW - unmanned ground vehicle (UGV)

KW - visual-language model

UR - http://www.scopus.com/inward/record.url?scp=85205052778&partnerID=8YFLogxK

U2 - 10.3390/drones8090421

DO - 10.3390/drones8090421

M3 - Article

AN - SCOPUS:85205052778

SN - 2504-446X

VL - 8

JO - Drones

JF - Drones

IS - 9

M1 - 421

ER -

Military Image Captioning for Low-Altitude UAV or UGV Perspectives

摘要

访问文件

其它文件与链接

指纹

引用此