Skip to main navigation Skip to search Skip to main content

Frequency-Guided Detail and Semantic Fusion Network for Building Extraction in Multimodal Remote Sensing Imagery

  • Jianhao Li
  • , Jue Wang*
  • , Hao Shi*
  • , Tianyu Wei
  • , Liang Chen
  • , Wei Li
  • *Corresponding author for this work
  • Beijing Institute of Technology
  • National Key Laboratory of Science and Technology on Space-Born Intelligent Information Processing

Research output: Contribution to journalArticlepeer-review

Abstract

Robust building extraction is a critical component of Earth observation (EO) applications, facilitating urban planning and disaster response. While atmospheric conditions or imaging mechanisms often hinder single-sensor interpretation, the joint exploitation of optical and Synthetic Aperture Radar (SAR) imagery offers a viable solution for all-weather monitoring. However, effective multimodal fusion in EO tasks faces unique challenges: noise amplification in low-level multimodal features and semantic misalignment at deep-level descriptions. In this work, we present a frequency-guided detail and semantic fusion network (FSFNet) for high-precision building extraction from co-registered optical and SAR imagery. FSFNet couples two complementary modules: a frequency-guided detail enhancement module (FDEM) that operates on shallow representations and a semantic feature fusion module (SFFM) that operates on deep representations. FDEM uses successive wavelet decompositions and cross-frequency fusion to decouple and asymmetrically combine low- and high-frequency information, while a spatial–frequency fusion component reinforces fine boundary cues from optical high-frequency signals. SFFM enforces modality-consistent semantic representations by applying a similarity-driven warping alignment that projects heterogeneous deep maps into a shared space for dense alignment and fusion. The integrated design produces boundary-aware and semantically consistent joint representations. Extensive experiments on public datasets demonstrate that our proposed FSFNet outperforms single-sensor baselines and other state-of-the-art multimodal methods. Qualitatively, our method achieves more complete building-body extraction while markedly improving boundary fidelity across diverse scenes.

Original languageEnglish
JournalIEEE Sensors Journal
DOIs
Publication statusAccepted/In press - 2026
Externally publishedYes

Keywords

  • Building extraction
  • feature alignment
  • frequency guidance
  • multi-source sensor data fusion
  • remote sensing data

Fingerprint

Dive into the research topics of 'Frequency-Guided Detail and Semantic Fusion Network for Building Extraction in Multimodal Remote Sensing Imagery'. Together they form a unique fingerprint.

Cite this