TY - JOUR
T1 - Popeye
T2 - A Unified Visual-Language Model for Multi-Source Ship Detection From Remote Sensing Imagery
AU - Zhang, Wei
AU - Cai, Miaoxin
AU - Zhang, Tong
AU - Lei, Guoqiang
AU - Zhuang, Yin
AU - Mao, Xuerui
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful generalization ability, a unified visual-language model called Popeye is proposed for multi-source ship detection from RS imagery. Specifically, to bridge the interpretation gap across the multi-source images for ship detection, a novel unified labeling paradigm is designed to integrate different visual modalities and the various ship detection ways, i.e., horizontal bounding box (HBB) and oriented bounding box (OBB). Subsequently, the hybrid experts encoder is designed to refine multi-scale visual features, thereby enhancing visual perception. Then, a visual-language alignment method is developed for Popeye to enhance interactive comprehension ability between visual and language content. Furthermore, an instruction adaption mechanism is proposed for transferring the pre-trained visual-language knowledge from the nature scene into the RS domain for multi-source ship detection. In addition, the segment anything model (SAM) is also seamlessly integrated into the proposed Popeye to achieve pixel-level ship segmentation without additional training costs. Finally, extensive experiments are conducted on the newly constructed ship instruction dataset named MMShip, and the results indicate that the proposed Popeye outperforms current specialist, open-vocabulary, and other visual-language models in zero-shot multi-source various ship detection tasks.
AB - Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful generalization ability, a unified visual-language model called Popeye is proposed for multi-source ship detection from RS imagery. Specifically, to bridge the interpretation gap across the multi-source images for ship detection, a novel unified labeling paradigm is designed to integrate different visual modalities and the various ship detection ways, i.e., horizontal bounding box (HBB) and oriented bounding box (OBB). Subsequently, the hybrid experts encoder is designed to refine multi-scale visual features, thereby enhancing visual perception. Then, a visual-language alignment method is developed for Popeye to enhance interactive comprehension ability between visual and language content. Furthermore, an instruction adaption mechanism is proposed for transferring the pre-trained visual-language knowledge from the nature scene into the RS domain for multi-source ship detection. In addition, the segment anything model (SAM) is also seamlessly integrated into the proposed Popeye to achieve pixel-level ship segmentation without additional training costs. Finally, extensive experiments are conducted on the newly constructed ship instruction dataset named MMShip, and the results indicate that the proposed Popeye outperforms current specialist, open-vocabulary, and other visual-language models in zero-shot multi-source various ship detection tasks.
KW - And natural language interaction
KW - multi-source imagery
KW - ship detection
KW - visual-language alignment
UR - http://www.scopus.com/inward/record.url?scp=85208221569&partnerID=8YFLogxK
U2 - 10.1109/JSTARS.2024.3488034
DO - 10.1109/JSTARS.2024.3488034
M3 - Article
AN - SCOPUS:85208221569
SN - 1939-1404
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -