Controllable Generative Knowledge-Driven Few-Shot Object Detection from Optical Remote Sensing Imagery

Tong Zhang; Yin Zhuang; Guanqun Wang; He Chen; Hao Wang; Lianlin Li; Jun Li

doi:10.1109/TGRS.2025.3541937

Controllable Generative Knowledge-Driven Few-Shot Object Detection from Optical Remote Sensing Imagery

Tong Zhang, Yin Zhuang^*, Guanqun Wang, He Chen, Hao Wang, Lianlin Li, Jun Li

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Few-shot object detection (FSOD) has to learn classification and localization information for unseen object detection under very low-data resource regimes. However, when deficient samples are adopted for model training, it is hard to build powerful location-Aware and identification abilities for well coping with agnostic bias from diverse testing scenarios; at the same time, the overfitting phenomenon is easily occurring. Therefore, in this article, a controllable generative knowledge-driven FSOD called CGK-FSOD is proposed for unseen object detection from optical remote sensing imagery. Specifically, to enrich the learnable data space of scarce samples for preventing incomplete agnostic-bias learning, while avoiding the overfitting phenomenon, a visual-Textual prompt-based controllable data generation is designed to generate high-quality object detection data based on pretrained foundational models [i.e., the stable diffusion (SD) and contrastive language-image pre-Training (CLIP)], which not only can introduce the generalized domain-level knowledge into the remote sensing domain but also sets up an all-round data space to support complete learning of potential agnostic bias. Furthermore, with respect to the denoising generative process of SD, a series of cross-modality generative features in latent representation space are reused for few-shot fine-Tuning by the designed cross-modality feature embedding (CMFE), which not only can bring diverse generative abilities into the feature fusion step of the detector but also gracefully sets up feature representation scalability to make the detector better adapt to agnostic bias from diverse testing scenarios of FSOD. Finally, extensive experiments are executed on two public remote sensing datasets (e.g., DIOR and NWPUVHR-10), and the results indicate that the proposed CGK-FSOD is very effective and flexible for FSOD.

源语言	英语
文章编号	5612319
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	63
DOI	https://doi.org/10.1109/TGRS.2025.3541937
出版状态	已出版 - 2025

访问文件

10.1109/TGRS.2025.3541937

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, T., Zhuang, Y., Wang, G., Chen, H., Wang, H., Li, L., & Li, J. (2025). Controllable Generative Knowledge-Driven Few-Shot Object Detection from Optical Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing, 63, 文章 5612319. https://doi.org/10.1109/TGRS.2025.3541937

@article{257bcecc8e5a4ea9a2d6ac19bca76169,

title = "Controllable Generative Knowledge-Driven Few-Shot Object Detection from Optical Remote Sensing Imagery",

abstract = "Few-shot object detection (FSOD) has to learn classification and localization information for unseen object detection under very low-data resource regimes. However, when deficient samples are adopted for model training, it is hard to build powerful location-Aware and identification abilities for well coping with agnostic bias from diverse testing scenarios; at the same time, the overfitting phenomenon is easily occurring. Therefore, in this article, a controllable generative knowledge-driven FSOD called CGK-FSOD is proposed for unseen object detection from optical remote sensing imagery. Specifically, to enrich the learnable data space of scarce samples for preventing incomplete agnostic-bias learning, while avoiding the overfitting phenomenon, a visual-Textual prompt-based controllable data generation is designed to generate high-quality object detection data based on pretrained foundational models [i.e., the stable diffusion (SD) and contrastive language-image pre-Training (CLIP)], which not only can introduce the generalized domain-level knowledge into the remote sensing domain but also sets up an all-round data space to support complete learning of potential agnostic bias. Furthermore, with respect to the denoising generative process of SD, a series of cross-modality generative features in latent representation space are reused for few-shot fine-Tuning by the designed cross-modality feature embedding (CMFE), which not only can bring diverse generative abilities into the feature fusion step of the detector but also gracefully sets up feature representation scalability to make the detector better adapt to agnostic bias from diverse testing scenarios of FSOD. Finally, extensive experiments are executed on two public remote sensing datasets (e.g., DIOR and NWPUVHR-10), and the results indicate that the proposed CGK-FSOD is very effective and flexible for FSOD.",

keywords = "Contrastive language-image pre-Training (CLIP), controllable generation, few-shot object detection (FSOD), remote sensing, stable diffusion (SD), visual-Textual prompt",

author = "Tong Zhang and Yin Zhuang and Guanqun Wang and He Chen and Hao Wang and Lianlin Li and Jun Li",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2025",

doi = "10.1109/TGRS.2025.3541937",

language = "English",

volume = "63",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Controllable Generative Knowledge-Driven Few-Shot Object Detection from Optical Remote Sensing Imagery

AU - Zhang, Tong

AU - Zhuang, Yin

AU - Wang, Guanqun

AU - Chen, He

AU - Wang, Hao

AU - Li, Lianlin

AU - Li, Jun

PY - 2025

Y1 - 2025

N2 - Few-shot object detection (FSOD) has to learn classification and localization information for unseen object detection under very low-data resource regimes. However, when deficient samples are adopted for model training, it is hard to build powerful location-Aware and identification abilities for well coping with agnostic bias from diverse testing scenarios; at the same time, the overfitting phenomenon is easily occurring. Therefore, in this article, a controllable generative knowledge-driven FSOD called CGK-FSOD is proposed for unseen object detection from optical remote sensing imagery. Specifically, to enrich the learnable data space of scarce samples for preventing incomplete agnostic-bias learning, while avoiding the overfitting phenomenon, a visual-Textual prompt-based controllable data generation is designed to generate high-quality object detection data based on pretrained foundational models [i.e., the stable diffusion (SD) and contrastive language-image pre-Training (CLIP)], which not only can introduce the generalized domain-level knowledge into the remote sensing domain but also sets up an all-round data space to support complete learning of potential agnostic bias. Furthermore, with respect to the denoising generative process of SD, a series of cross-modality generative features in latent representation space are reused for few-shot fine-Tuning by the designed cross-modality feature embedding (CMFE), which not only can bring diverse generative abilities into the feature fusion step of the detector but also gracefully sets up feature representation scalability to make the detector better adapt to agnostic bias from diverse testing scenarios of FSOD. Finally, extensive experiments are executed on two public remote sensing datasets (e.g., DIOR and NWPUVHR-10), and the results indicate that the proposed CGK-FSOD is very effective and flexible for FSOD.

AB - Few-shot object detection (FSOD) has to learn classification and localization information for unseen object detection under very low-data resource regimes. However, when deficient samples are adopted for model training, it is hard to build powerful location-Aware and identification abilities for well coping with agnostic bias from diverse testing scenarios; at the same time, the overfitting phenomenon is easily occurring. Therefore, in this article, a controllable generative knowledge-driven FSOD called CGK-FSOD is proposed for unseen object detection from optical remote sensing imagery. Specifically, to enrich the learnable data space of scarce samples for preventing incomplete agnostic-bias learning, while avoiding the overfitting phenomenon, a visual-Textual prompt-based controllable data generation is designed to generate high-quality object detection data based on pretrained foundational models [i.e., the stable diffusion (SD) and contrastive language-image pre-Training (CLIP)], which not only can introduce the generalized domain-level knowledge into the remote sensing domain but also sets up an all-round data space to support complete learning of potential agnostic bias. Furthermore, with respect to the denoising generative process of SD, a series of cross-modality generative features in latent representation space are reused for few-shot fine-Tuning by the designed cross-modality feature embedding (CMFE), which not only can bring diverse generative abilities into the feature fusion step of the detector but also gracefully sets up feature representation scalability to make the detector better adapt to agnostic bias from diverse testing scenarios of FSOD. Finally, extensive experiments are executed on two public remote sensing datasets (e.g., DIOR and NWPUVHR-10), and the results indicate that the proposed CGK-FSOD is very effective and flexible for FSOD.

KW - Contrastive language-image pre-Training (CLIP)

KW - controllable generation

KW - few-shot object detection (FSOD)

KW - remote sensing

KW - stable diffusion (SD)

KW - visual-Textual prompt

UR - http://www.scopus.com/inward/record.url?scp=86000805674&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2025.3541937

DO - 10.1109/TGRS.2025.3541937

M3 - Article

AN - SCOPUS:86000805674

SN - 0196-2892

VL - 63

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5612319

ER -

Controllable Generative Knowledge-Driven Few-Shot Object Detection from Optical Remote Sensing Imagery

摘要

访问文件

其它文件与链接

指纹

引用此