PT-RE: Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images

Yuxuan Han; Qingxiao Liu; Haiou Liu; Xiuzhong Hu; Boyang Wang

doi:10.1109/JSEN.2024.3428483

PT-RE: Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images

Yuxuan Han, Qingxiao Liu, Haiou Liu, Xiuzhong Hu, Boyang Wang^*

^*Corresponding author for this work

School of Mechanical Engineering

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Road network extraction from remote sensing images can provide precise map information for global positioning and planning. While existing transformer-based methods show promising performance in road network extraction, they suffer from misleading results of crossroad and low generalization ability. In our study, a prompt-based multimodal transformer for road network extraction (PT-RE) is proposed. In PT-RE, a Swin transformer is used as the backbone network to extract image features from remote sensing images. Then, a fine-tuned prompt-based method is employed to generate the road topology classification contexts. The prompt-based information generation and cross-modal loss function are designed to deal with the fine-tuning task. Compared with the original uni-modal loss function in fine-tuning, the cross-modal method processes the different modal information and improves the generalization ability. Finally, the topology decoder utilizes cross-attention architecture to predict the relationship by the information from images and classification contexts. With the help of different views and modal information, the framework strengthens the accuracy of crossroad detection rather than the uni-modal type. The proposed topological road network extraction method demonstrates superior accuracy across 20 U.S. Cities datasets and SpaceNet datasets, showcasing its accuracy and generalization ability.

Original language	English
Pages (from-to)	35832-35844
Number of pages	13
Journal	IEEE Sensors Journal
Volume	24
Issue number	21
DOIs	https://doi.org/10.1109/JSEN.2024.3428483
Publication status	Published - 2024

Keywords

Prompt-based method
remote sensing images (RS)
road network extraction
topology graph

Access to Document

10.1109/JSEN.2024.3428483

Cite this

@article{eb445df21021412da27136cced34631c,

title = "PT-RE: Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images",

abstract = "Road network extraction from remote sensing images can provide precise map information for global positioning and planning. While existing transformer-based methods show promising performance in road network extraction, they suffer from misleading results of crossroad and low generalization ability. In our study, a prompt-based multimodal transformer for road network extraction (PT-RE) is proposed. In PT-RE, a Swin transformer is used as the backbone network to extract image features from remote sensing images. Then, a fine-tuned prompt-based method is employed to generate the road topology classification contexts. The prompt-based information generation and cross-modal loss function are designed to deal with the fine-tuning task. Compared with the original uni-modal loss function in fine-tuning, the cross-modal method processes the different modal information and improves the generalization ability. Finally, the topology decoder utilizes cross-attention architecture to predict the relationship by the information from images and classification contexts. With the help of different views and modal information, the framework strengthens the accuracy of crossroad detection rather than the uni-modal type. The proposed topological road network extraction method demonstrates superior accuracy across 20 U.S. Cities datasets and SpaceNet datasets, showcasing its accuracy and generalization ability.",

keywords = "Prompt-based method, remote sensing images (RS), road network extraction, topology graph",

author = "Yuxuan Han and Qingxiao Liu and Haiou Liu and Xiuzhong Hu and Boyang Wang",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.",

year = "2024",

doi = "10.1109/JSEN.2024.3428483",

language = "English",

volume = "24",

pages = "35832--35844",

journal = "IEEE Sensors Journal",

issn = "1530-437X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "21",

}

TY - JOUR

T1 - PT-RE

T2 - Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images

AU - Han, Yuxuan

AU - Liu, Qingxiao

AU - Liu, Haiou

AU - Hu, Xiuzhong

AU - Wang, Boyang

PY - 2024

Y1 - 2024

N2 - Road network extraction from remote sensing images can provide precise map information for global positioning and planning. While existing transformer-based methods show promising performance in road network extraction, they suffer from misleading results of crossroad and low generalization ability. In our study, a prompt-based multimodal transformer for road network extraction (PT-RE) is proposed. In PT-RE, a Swin transformer is used as the backbone network to extract image features from remote sensing images. Then, a fine-tuned prompt-based method is employed to generate the road topology classification contexts. The prompt-based information generation and cross-modal loss function are designed to deal with the fine-tuning task. Compared with the original uni-modal loss function in fine-tuning, the cross-modal method processes the different modal information and improves the generalization ability. Finally, the topology decoder utilizes cross-attention architecture to predict the relationship by the information from images and classification contexts. With the help of different views and modal information, the framework strengthens the accuracy of crossroad detection rather than the uni-modal type. The proposed topological road network extraction method demonstrates superior accuracy across 20 U.S. Cities datasets and SpaceNet datasets, showcasing its accuracy and generalization ability.

AB - Road network extraction from remote sensing images can provide precise map information for global positioning and planning. While existing transformer-based methods show promising performance in road network extraction, they suffer from misleading results of crossroad and low generalization ability. In our study, a prompt-based multimodal transformer for road network extraction (PT-RE) is proposed. In PT-RE, a Swin transformer is used as the backbone network to extract image features from remote sensing images. Then, a fine-tuned prompt-based method is employed to generate the road topology classification contexts. The prompt-based information generation and cross-modal loss function are designed to deal with the fine-tuning task. Compared with the original uni-modal loss function in fine-tuning, the cross-modal method processes the different modal information and improves the generalization ability. Finally, the topology decoder utilizes cross-attention architecture to predict the relationship by the information from images and classification contexts. With the help of different views and modal information, the framework strengthens the accuracy of crossroad detection rather than the uni-modal type. The proposed topological road network extraction method demonstrates superior accuracy across 20 U.S. Cities datasets and SpaceNet datasets, showcasing its accuracy and generalization ability.

KW - Prompt-based method

KW - remote sensing images (RS)

KW - road network extraction

KW - topology graph

UR - http://www.scopus.com/inward/record.url?scp=85201746365&partnerID=8YFLogxK

U2 - 10.1109/JSEN.2024.3428483

DO - 10.1109/JSEN.2024.3428483

M3 - Article

AN - SCOPUS:85201746365

SN - 1530-437X

VL - 24

SP - 35832

EP - 35844

JO - IEEE Sensors Journal

JF - IEEE Sensors Journal

IS - 21

ER -

PT-RE: Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this