TY - GEN
T1 - Image Editing based on Diffusion Model for Remote Sensing Image Change Captioning
AU - Cai, Miaoxin
AU - Chen, He
AU - Li, Can
AU - Gan, Shuyu
AU - Chen, Liang
AU - Zhuang, Yin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Remote Sensing Image Change Captioning (RSICC) is a task that utilizes natural language to describe changes in remote sensing images of the same area captured at different times. However, the significant temporal intervals between multi-temporal images, the infrequency of observable changes, and the limitations on observation locations make it difficult to acquire and annotate a large and diverse dataset for analyzing change in multi-temporal images. The scarcity of labeled data hinders the training of RSICC models, leading to poor generalization. Compared to annotated registered bi-temporal image, single-temporal data is easier to obtain. Therefore, to tackle the issue of poor generalization of RSICC models under limited annotated sample conditions, a text-guided image pairs generation (TGIPG) method is proposed to create synthetic RSICC datasets from single-temporal data and randomly sampled text instructions via a diffusion-based controllable image editing model. This approach generates more valid pairs of multi-temporal samples to address the constraints of limited change information. Specifically, this method utilizes language instructions to introduce change information into the diffusion process, gradually transforming the pre-phase image into the post-phase image. Our experiments on the LEVIR-CC dataset show that synthetic data can significantly enhance the performance of any RSICC model, with a restricted number of training samples, by employing this plug-and-play TGIPG method.
AB - Remote Sensing Image Change Captioning (RSICC) is a task that utilizes natural language to describe changes in remote sensing images of the same area captured at different times. However, the significant temporal intervals between multi-temporal images, the infrequency of observable changes, and the limitations on observation locations make it difficult to acquire and annotate a large and diverse dataset for analyzing change in multi-temporal images. The scarcity of labeled data hinders the training of RSICC models, leading to poor generalization. Compared to annotated registered bi-temporal image, single-temporal data is easier to obtain. Therefore, to tackle the issue of poor generalization of RSICC models under limited annotated sample conditions, a text-guided image pairs generation (TGIPG) method is proposed to create synthetic RSICC datasets from single-temporal data and randomly sampled text instructions via a diffusion-based controllable image editing model. This approach generates more valid pairs of multi-temporal samples to address the constraints of limited change information. Specifically, this method utilizes language instructions to introduce change information into the diffusion process, gradually transforming the pre-phase image into the post-phase image. Our experiments on the LEVIR-CC dataset show that synthetic data can significantly enhance the performance of any RSICC model, with a restricted number of training samples, by employing this plug-and-play TGIPG method.
KW - Change captioning
KW - diffusion
KW - image generation
KW - remote sensing
UR - http://www.scopus.com/inward/record.url?scp=86000023372&partnerID=8YFLogxK
U2 - 10.1109/ICSIDP62679.2024.10868357
DO - 10.1109/ICSIDP62679.2024.10868357
M3 - Conference contribution
AN - SCOPUS:86000023372
T3 - IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
BT - IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
Y2 - 22 November 2024 through 24 November 2024
ER -