Diffusion-Geo: A Two-Stage Controllable Text-To-Image Generative Model for Remote Sensing Scenarios

Miaoxin Cai, Wei Zhang, Tong Zhang, Yin Zhuang*, He Chen, Liang Chen, Can Li

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Image generation is a crucial task to facilitate intelligent interpretation in remote sensing domain. Expanding dataset size through image generation can enhance model performance of downtown task. However, current generative models in remote sensing are mostly unconditional or guided by simple text, resulting in generated images lacking spatial and semantic constraints. This lack of control can negatively optimize downstream task models. To tackle these challenges, a two-stage controllable text-image generative model called Diffusion-Geo is presented. In the first stage, an extensive image-text generation dataset called RS-Control is created through prompt engineering of multimodal large language models (MLLMs) and manual prompts for existing datasets, incorporates diverse conditional controls with rich spatial and semantic information. Then RS-Control dataset is utilized to train a universal controllable image generative model. The second stage involves efficient tuning the universal model for different task datasets, minimizing fine-tuning costs while preserving diversity and high-quality features. Experiments conducted on the RSICD caption dataset and WHU change detection dataset demonstrate the superiority of Diffusion-Geo over other state-of-the-art models in image generation.

Original languageEnglish
Title of host publicationIGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7003-7006
Number of pages4
ISBN (Electronic)9798350360325
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024 - Athens, Greece
Duration: 7 Jul 202412 Jul 2024

Publication series

NameInternational Geoscience and Remote Sensing Symposium (IGARSS)

Conference

Conference2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024
Country/TerritoryGreece
CityAthens
Period7/07/2412/07/24

Keywords

  • controllable text-to-image generation
  • diffusion
  • remote sensing

Fingerprint

Dive into the research topics of 'Diffusion-Geo: A Two-Stage Controllable Text-To-Image Generative Model for Remote Sensing Scenarios'. Together they form a unique fingerprint.

Cite this