TY - JOUR
T1 - BMDENet
T2 - Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation
AU - Zhao, Ying
AU - Song, Kechen
AU - Zhang, Yiming
AU - Yan, Yunhui
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - Few-shot semantic segmentation (FSS) aims to segment the target prospects of query images using a few labeled support samples. Compared with the fully-supervised methods, FSS has a greater ability to generalize to unseen classes and reduce the pressure to label large pixel-level datasets. To cope with the complex outdoor lighting environment, we introduce the thermal infrared images (T) to the FSS task. However, the existing RGB-T FSS methods all ignore the differences between various modalities for direct fusion, which may hinder cross-modal information interaction. Also considering the effect of successive downsampling on the results, we propose a bi-directional modality difference elimination network (BMDENet) to boost the segmentation performance. Concretely, the bi-directional modality difference elimination module (BMDEM) reduces the heterogeneity between RGB and thermal images in the prototype space. The residual attention fusion module (RAFM) mines the bimodal features to fully fuse the cross-modal information. In addition, the mainstay and subsidiary enhancement module (MSEM) enhances the fusion features for the existing problem of the advanced model. Extensive experiments on Tokyo Multi-Spectral- 4i dataset prove that BMDENet achieves the state-of-the-art on both 1- and 5-shot settings.
AB - Few-shot semantic segmentation (FSS) aims to segment the target prospects of query images using a few labeled support samples. Compared with the fully-supervised methods, FSS has a greater ability to generalize to unseen classes and reduce the pressure to label large pixel-level datasets. To cope with the complex outdoor lighting environment, we introduce the thermal infrared images (T) to the FSS task. However, the existing RGB-T FSS methods all ignore the differences between various modalities for direct fusion, which may hinder cross-modal information interaction. Also considering the effect of successive downsampling on the results, we propose a bi-directional modality difference elimination network (BMDENet) to boost the segmentation performance. Concretely, the bi-directional modality difference elimination module (BMDEM) reduces the heterogeneity between RGB and thermal images in the prototype space. The residual attention fusion module (RAFM) mines the bimodal features to fully fuse the cross-modal information. In addition, the mainstay and subsidiary enhancement module (MSEM) enhances the fusion features for the existing problem of the advanced model. Extensive experiments on Tokyo Multi-Spectral- 4i dataset prove that BMDENet achieves the state-of-the-art on both 1- and 5-shot settings.
KW - Few-shot semantic segmentation
KW - RGB-T FSS
KW - cross-modal
KW - difference elimination
UR - http://www.scopus.com/inward/record.url?scp=85161038927&partnerID=8YFLogxK
U2 - 10.1109/TCSII.2023.3278941
DO - 10.1109/TCSII.2023.3278941
M3 - Article
AN - SCOPUS:85161038927
SN - 1549-7747
VL - 70
SP - 4266
EP - 4270
JO - IEEE Transactions on Circuits and Systems II: Express Briefs
JF - IEEE Transactions on Circuits and Systems II: Express Briefs
IS - 11
ER -