TY - JOUR
T1 - Optical and SAR Cross-Modal Hallucination Collaborative Learning for Remote Sensing Missing-Modality Building Footprint Extraction
AU - Wei, Tianyu
AU - Chen, He
AU - Liu, Wenchao
AU - Chen, Liang
AU - Gu, Panzhe
AU - Wang, Jue
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Building footprint extraction using optical and synthetic aperture radar (SAR) images enables all-weather capability and significantly boosts performance. In practical scenarios, optical data may not be available, leading to the missing-modality challenge. To overcome this challenge, advanced methods employ mainstream knowledge distillation approaches with hallucination network schemes to improve performance. However, under complex SAR backgrounds, current hallucination-network-based methods suffer from cross-modal information transfer failure between optical and hallucination models. To solve this problem, this study introduces a cross-modal hallucination collaborative learning (CMH-CL) method, consisting of two components: modality-share information alignment learning (MSAL) and multimodal fusion information alignment learning (MFAL). The MSAL method facilitates cross-modal knowledge transfer between optical and hallucination encoders, thereby enabling the hallucination model to effectively mimic the missing optical modality. The MFAL method aligns semantic information between OPT-SAR and HAL-SAR fusion heads to strengthen their semantic consistency, thereby improving HAL-SAR fusion performance. By combining MSAL and MFAL, the CMH-CL method collaboratively alleviates cross-modal transfer failure problem between the optical and hallucination models, thereby improving performance in missing-modality building footprint extraction. Extensive experimental results obtained on a public dataset demonstrate the effectiveness of the proposed CMH-CL.
AB - Building footprint extraction using optical and synthetic aperture radar (SAR) images enables all-weather capability and significantly boosts performance. In practical scenarios, optical data may not be available, leading to the missing-modality challenge. To overcome this challenge, advanced methods employ mainstream knowledge distillation approaches with hallucination network schemes to improve performance. However, under complex SAR backgrounds, current hallucination-network-based methods suffer from cross-modal information transfer failure between optical and hallucination models. To solve this problem, this study introduces a cross-modal hallucination collaborative learning (CMH-CL) method, consisting of two components: modality-share information alignment learning (MSAL) and multimodal fusion information alignment learning (MFAL). The MSAL method facilitates cross-modal knowledge transfer between optical and hallucination encoders, thereby enabling the hallucination model to effectively mimic the missing optical modality. The MFAL method aligns semantic information between OPT-SAR and HAL-SAR fusion heads to strengthen their semantic consistency, thereby improving HAL-SAR fusion performance. By combining MSAL and MFAL, the CMH-CL method collaboratively alleviates cross-modal transfer failure problem between the optical and hallucination models, thereby improving performance in missing-modality building footprint extraction. Extensive experimental results obtained on a public dataset demonstrate the effectiveness of the proposed CMH-CL.
KW - Building footprint extraction
KW - hallucination networks
KW - modality-missing
KW - modality-share information
KW - synthetic aperture radar (SAR)
UR - https://www.scopus.com/pages/publications/105023179813
U2 - 10.1109/JSTARS.2025.3638382
DO - 10.1109/JSTARS.2025.3638382
M3 - Article
AN - SCOPUS:105023179813
SN - 1939-1404
VL - 19
SP - 1183
EP - 1196
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -