TY - JOUR
T1 - Multimodal air-quality prediction
T2 - A multimodal feature fusion network based on shared-specific modal feature decoupling
AU - Chen, Xiaoxia
AU - Wang, Zhen
AU - Dong, Fangyan
AU - Hirota, Kaoru
N1 - Publisher Copyright:
© 2025
PY - 2025/8
Y1 - 2025/8
N2 - Severe air pollution degrades air quality and threatens human health, necessitating accurate prediction for pollution control. While spatiotemporal networks integrating sequence models and graph structures dominate current methods, prior work neglects multimodal data fusion to enhance feature representation. This study addresses the spatial limitations of single-perspective ground monitoring by synergizing remote sensing data, which provides global air quality distribution, with ground observations. We propose a Shared-Specific Modality Decoupling-based Spatiotemporal Multimodal Fusion Network for air-quality prediction, comprising: (1) feature extractors for remote sensing images and ground monitoring data, (2) a decoupling module separating shared and modality-specific features, and (3) a hierarchical attention-graph convolution fusion module. This framework achieves effective multimodal fusion by disentangling cross-modal dependencies while preserving unique characteristics. Evaluations on two real-world datasets demonstrate superior performance over baseline models, validating the efficacy of multimodal integration for spatial–temporal air quality forecasting.
AB - Severe air pollution degrades air quality and threatens human health, necessitating accurate prediction for pollution control. While spatiotemporal networks integrating sequence models and graph structures dominate current methods, prior work neglects multimodal data fusion to enhance feature representation. This study addresses the spatial limitations of single-perspective ground monitoring by synergizing remote sensing data, which provides global air quality distribution, with ground observations. We propose a Shared-Specific Modality Decoupling-based Spatiotemporal Multimodal Fusion Network for air-quality prediction, comprising: (1) feature extractors for remote sensing images and ground monitoring data, (2) a decoupling module separating shared and modality-specific features, and (3) a hierarchical attention-graph convolution fusion module. This framework achieves effective multimodal fusion by disentangling cross-modal dependencies while preserving unique characteristics. Evaluations on two real-world datasets demonstrate superior performance over baseline models, validating the efficacy of multimodal integration for spatial–temporal air quality forecasting.
KW - Air-quality prediction
KW - Multimodal fusion
KW - Spatial–temporal network
KW - Time series forecasting
UR - http://www.scopus.com/inward/record.url?scp=105008505497&partnerID=8YFLogxK
U2 - 10.1016/j.envsoft.2025.106553
DO - 10.1016/j.envsoft.2025.106553
M3 - Article
AN - SCOPUS:105008505497
SN - 1364-8152
VL - 192
JO - Environmental Modelling and Software
JF - Environmental Modelling and Software
M1 - 106553
ER -