TY - CONF
T1 - ADVANCING CONTROLLABLE DIFFUSION MODEL FOR FEW-SHOT OBJECT DETECTION IN OPTICAL REMOTE SENSING IMAGERY
AU - Zhang, Tong
AU - Zhuang, Yin
AU - Zhang, Xinyi
AU - Wang, Guanqun
AU - Chen, He
AU - Bi, Fukun
N1 - Publisher Copyright:
©2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Few-shot object detection (FSOD) from optical remote sensing imagery has to detect rare objects given only a few annotated bounding boxes. The limited training data is hard to represent the data distribution of realistic remote sensing scenes, restricting the performance of FSOD. Recently, learning conditional controls for text-to-image diffusion model has achieved great progress, which is capable of precisely generating the controllable yet imaginational images by text prompt and spatially localized input conditions. Accordingly, in this work, we aim to explore the potential of diffusion model and propose a solution for few-shot object detection by controllable data generation. Firstly, draw upon a few annotated objects, their bounding boxes and categories are respectively used as the spatial conditions and text prompts, then employ them into large text-to-image diffusion models for controlled image generation. Secondly, based the generated images, in order to adapt to the scale and orientation variances of remote sensing objects, a data transformation is devised for boosting the robustness of model training. Finally, some experiments were conducted on public remote sensing dataset DIOR, and the results proved its effectiveness.
AB - Few-shot object detection (FSOD) from optical remote sensing imagery has to detect rare objects given only a few annotated bounding boxes. The limited training data is hard to represent the data distribution of realistic remote sensing scenes, restricting the performance of FSOD. Recently, learning conditional controls for text-to-image diffusion model has achieved great progress, which is capable of precisely generating the controllable yet imaginational images by text prompt and spatially localized input conditions. Accordingly, in this work, we aim to explore the potential of diffusion model and propose a solution for few-shot object detection by controllable data generation. Firstly, draw upon a few annotated objects, their bounding boxes and categories are respectively used as the spatial conditions and text prompts, then employ them into large text-to-image diffusion models for controlled image generation. Secondly, based the generated images, in order to adapt to the scale and orientation variances of remote sensing objects, a data transformation is devised for boosting the robustness of model training. Finally, some experiments were conducted on public remote sensing dataset DIOR, and the results proved its effectiveness.
KW - Diffusion model
KW - few-shot object detection
KW - optical remote sensing imagery
KW - stable diffusion
UR - http://www.scopus.com/inward/record.url?scp=85208465427&partnerID=8YFLogxK
U2 - 10.1109/IGARSS53475.2024.10642625
DO - 10.1109/IGARSS53475.2024.10642625
M3 - Paper
AN - SCOPUS:85208465427
SP - 7600
EP - 7603
T2 - 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024
Y2 - 7 July 2024 through 12 July 2024
ER -