TY - JOUR
T1 - PT-RE
T2 - Prompt-Based Multimodal Transformer for Road Network Extraction From Remote Sensing Images
AU - Han, Yuxuan
AU - Liu, Qingxiao
AU - Liu, Haiou
AU - Hu, Xiuzhong
AU - Wang, Boyang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Road network extraction from remote sensing images can provide precise map information for global positioning and planning. While existing transformer-based methods show promising performance in road network extraction, they suffer from misleading results of crossroad and low generalization ability. In our study, a prompt-based multimodal transformer for road network extraction (PT-RE) is proposed. In PT-RE, a Swin transformer is used as the backbone network to extract image features from remote sensing images. Then, a fine-tuned prompt-based method is employed to generate the road topology classification contexts. The prompt-based information generation and cross-modal loss function are designed to deal with the fine-tuning task. Compared with the original uni-modal loss function in fine-tuning, the cross-modal method processes the different modal information and improves the generalization ability. Finally, the topology decoder utilizes cross-attention architecture to predict the relationship by the information from images and classification contexts. With the help of different views and modal information, the framework strengthens the accuracy of crossroad detection rather than the uni-modal type. The proposed topological road network extraction method demonstrates superior accuracy across 20 U.S. Cities datasets and SpaceNet datasets, showcasing its accuracy and generalization ability.
AB - Road network extraction from remote sensing images can provide precise map information for global positioning and planning. While existing transformer-based methods show promising performance in road network extraction, they suffer from misleading results of crossroad and low generalization ability. In our study, a prompt-based multimodal transformer for road network extraction (PT-RE) is proposed. In PT-RE, a Swin transformer is used as the backbone network to extract image features from remote sensing images. Then, a fine-tuned prompt-based method is employed to generate the road topology classification contexts. The prompt-based information generation and cross-modal loss function are designed to deal with the fine-tuning task. Compared with the original uni-modal loss function in fine-tuning, the cross-modal method processes the different modal information and improves the generalization ability. Finally, the topology decoder utilizes cross-attention architecture to predict the relationship by the information from images and classification contexts. With the help of different views and modal information, the framework strengthens the accuracy of crossroad detection rather than the uni-modal type. The proposed topological road network extraction method demonstrates superior accuracy across 20 U.S. Cities datasets and SpaceNet datasets, showcasing its accuracy and generalization ability.
KW - Prompt-based method
KW - remote sensing images (RS)
KW - road network extraction
KW - topology graph
UR - http://www.scopus.com/inward/record.url?scp=85201746365&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2024.3428483
DO - 10.1109/JSEN.2024.3428483
M3 - Article
AN - SCOPUS:85201746365
SN - 1530-437X
VL - 24
SP - 35832
EP - 35844
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 21
ER -