PT-RE: Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images

Yuxuan Han, Qingxiao Liu, Haiou Liu, Xiuzhong Hu, Boyang Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Road network extraction from remote sensing images can provide precise map information for global positioning and planning. While existing transformer-based methods show promising performance in road network extraction, they suffer from misleading results of crossroad and low generalization ability. In our study, a prompt-based multimodal transformer for road network extraction (PT-RE) is proposed. In PT-RE, a Swin transformer is used as the backbone network to extract image features from remote sensing images. Then, a fine-tuned prompt-based method is employed to generate the road topology classification contexts. The prompt-based information generation and cross-modal loss function are designed to deal with the fine-tuning task. Compared with the original uni-modal loss function in fine-tuning, the cross-modal method processes the different modal information and improves the generalization ability. Finally, the topology decoder utilizes cross-attention architecture to predict the relationship by the information from images and classification contexts. With the help of different views and modal information, the framework strengthens the accuracy of crossroad detection rather than the uni-modal type. The proposed topological road network extraction method demonstrates superior accuracy across 20 U.S. Cities datasets and SpaceNet datasets, showcasing its accuracy and generalization ability.

Original languageEnglish
Pages (from-to)35832-35844
Number of pages13
JournalIEEE Sensors Journal
Volume24
Issue number21
DOIs
Publication statusPublished - 2024

Keywords

  • Prompt-based method
  • remote sensing images (RS)
  • road network extraction
  • topology graph

Fingerprint

Dive into the research topics of 'PT-RE: Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images'. Together they form a unique fingerprint.

Cite this