Abstract
Robust building extraction is a critical component of Earth observation (EO) applications, facilitating urban planning and disaster response. While atmospheric conditions or imaging mechanisms often hinder single-sensor interpretation, the joint exploitation of optical and Synthetic Aperture Radar (SAR) imagery offers a viable solution for all-weather monitoring. However, effective multimodal fusion in EO tasks faces unique challenges: noise amplification in low-level multimodal features and semantic misalignment at deep-level descriptions. In this work, we present a frequency-guided detail and semantic fusion network (FSFNet) for high-precision building extraction from co-registered optical and SAR imagery. FSFNet couples two complementary modules: a frequency-guided detail enhancement module (FDEM) that operates on shallow representations and a semantic feature fusion module (SFFM) that operates on deep representations. FDEM uses successive wavelet decompositions and cross-frequency fusion to decouple and asymmetrically combine low- and high-frequency information, while a spatial–frequency fusion component reinforces fine boundary cues from optical high-frequency signals. SFFM enforces modality-consistent semantic representations by applying a similarity-driven warping alignment that projects heterogeneous deep maps into a shared space for dense alignment and fusion. The integrated design produces boundary-aware and semantically consistent joint representations. Extensive experiments on public datasets demonstrate that our proposed FSFNet outperforms single-sensor baselines and other state-of-the-art multimodal methods. Qualitatively, our method achieves more complete building-body extraction while markedly improving boundary fidelity across diverse scenes.
| Original language | English |
|---|---|
| Journal | IEEE Sensors Journal |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Building extraction
- feature alignment
- frequency guidance
- multi-source sensor data fusion
- remote sensing data
Fingerprint
Dive into the research topics of 'Frequency-Guided Detail and Semantic Fusion Network for Building Extraction in Multimodal Remote Sensing Imagery'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver