BMDENet: Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation

Ying Zhao; Kechen Song; Yiming Zhang; Yunhui Yan

doi:10.1109/TCSII.2023.3278941

BMDENet: Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation

Ying Zhao, Kechen Song^*, Yiming Zhang, Yunhui Yan^*

^*Corresponding author for this work

Northeastern University China

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

Few-shot semantic segmentation (FSS) aims to segment the target prospects of query images using a few labeled support samples. Compared with the fully-supervised methods, FSS has a greater ability to generalize to unseen classes and reduce the pressure to label large pixel-level datasets. To cope with the complex outdoor lighting environment, we introduce the thermal infrared images (T) to the FSS task. However, the existing RGB-T FSS methods all ignore the differences between various modalities for direct fusion, which may hinder cross-modal information interaction. Also considering the effect of successive downsampling on the results, we propose a bi-directional modality difference elimination network (BMDENet) to boost the segmentation performance. Concretely, the bi-directional modality difference elimination module (BMDEM) reduces the heterogeneity between RGB and thermal images in the prototype space. The residual attention fusion module (RAFM) mines the bimodal features to fully fuse the cross-modal information. In addition, the mainstay and subsidiary enhancement module (MSEM) enhances the fusion features for the existing problem of the advanced model. Extensive experiments on Tokyo Multi-Spectral- 4i dataset prove that BMDENet achieves the state-of-the-art on both 1- and 5-shot settings.

Original language	English
Pages (from-to)	4266-4270
Number of pages	5
Journal	IEEE Transactions on Circuits and Systems II: Express Briefs
Volume	70
Issue number	11
DOIs	https://doi.org/10.1109/TCSII.2023.3278941
Publication status	Published - 1 Nov 2023
Externally published	Yes

Keywords

Few-shot semantic segmentation
RGB-T FSS
cross-modal
difference elimination

Access to Document

10.1109/TCSII.2023.3278941

Cite this

@article{a2bb457765064491b25e47ba7098e03f,

title = "BMDENet: Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation",

abstract = "Few-shot semantic segmentation (FSS) aims to segment the target prospects of query images using a few labeled support samples. Compared with the fully-supervised methods, FSS has a greater ability to generalize to unseen classes and reduce the pressure to label large pixel-level datasets. To cope with the complex outdoor lighting environment, we introduce the thermal infrared images (T) to the FSS task. However, the existing RGB-T FSS methods all ignore the differences between various modalities for direct fusion, which may hinder cross-modal information interaction. Also considering the effect of successive downsampling on the results, we propose a bi-directional modality difference elimination network (BMDENet) to boost the segmentation performance. Concretely, the bi-directional modality difference elimination module (BMDEM) reduces the heterogeneity between RGB and thermal images in the prototype space. The residual attention fusion module (RAFM) mines the bimodal features to fully fuse the cross-modal information. In addition, the mainstay and subsidiary enhancement module (MSEM) enhances the fusion features for the existing problem of the advanced model. Extensive experiments on Tokyo Multi-Spectral- 4i dataset prove that BMDENet achieves the state-of-the-art on both 1- and 5-shot settings.",

keywords = "Few-shot semantic segmentation, RGB-T FSS, cross-modal, difference elimination",

author = "Ying Zhao and Kechen Song and Yiming Zhang and Yunhui Yan",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.",

year = "2023",

month = nov,

day = "1",

doi = "10.1109/TCSII.2023.3278941",

language = "English",

volume = "70",

pages = "4266--4270",

journal = "IEEE Transactions on Circuits and Systems II: Express Briefs",

issn = "1549-7747",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "11",

}

TY - JOUR

T1 - BMDENet

T2 - Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation

AU - Zhao, Ying

AU - Song, Kechen

AU - Zhang, Yiming

AU - Yan, Yunhui

PY - 2023/11/1

Y1 - 2023/11/1

N2 - Few-shot semantic segmentation (FSS) aims to segment the target prospects of query images using a few labeled support samples. Compared with the fully-supervised methods, FSS has a greater ability to generalize to unseen classes and reduce the pressure to label large pixel-level datasets. To cope with the complex outdoor lighting environment, we introduce the thermal infrared images (T) to the FSS task. However, the existing RGB-T FSS methods all ignore the differences between various modalities for direct fusion, which may hinder cross-modal information interaction. Also considering the effect of successive downsampling on the results, we propose a bi-directional modality difference elimination network (BMDENet) to boost the segmentation performance. Concretely, the bi-directional modality difference elimination module (BMDEM) reduces the heterogeneity between RGB and thermal images in the prototype space. The residual attention fusion module (RAFM) mines the bimodal features to fully fuse the cross-modal information. In addition, the mainstay and subsidiary enhancement module (MSEM) enhances the fusion features for the existing problem of the advanced model. Extensive experiments on Tokyo Multi-Spectral- 4i dataset prove that BMDENet achieves the state-of-the-art on both 1- and 5-shot settings.

AB - Few-shot semantic segmentation (FSS) aims to segment the target prospects of query images using a few labeled support samples. Compared with the fully-supervised methods, FSS has a greater ability to generalize to unseen classes and reduce the pressure to label large pixel-level datasets. To cope with the complex outdoor lighting environment, we introduce the thermal infrared images (T) to the FSS task. However, the existing RGB-T FSS methods all ignore the differences between various modalities for direct fusion, which may hinder cross-modal information interaction. Also considering the effect of successive downsampling on the results, we propose a bi-directional modality difference elimination network (BMDENet) to boost the segmentation performance. Concretely, the bi-directional modality difference elimination module (BMDEM) reduces the heterogeneity between RGB and thermal images in the prototype space. The residual attention fusion module (RAFM) mines the bimodal features to fully fuse the cross-modal information. In addition, the mainstay and subsidiary enhancement module (MSEM) enhances the fusion features for the existing problem of the advanced model. Extensive experiments on Tokyo Multi-Spectral- 4i dataset prove that BMDENet achieves the state-of-the-art on both 1- and 5-shot settings.

KW - Few-shot semantic segmentation

KW - RGB-T FSS

KW - cross-modal

KW - difference elimination

UR - http://www.scopus.com/inward/record.url?scp=85161038927&partnerID=8YFLogxK

U2 - 10.1109/TCSII.2023.3278941

DO - 10.1109/TCSII.2023.3278941

M3 - Article

AN - SCOPUS:85161038927

SN - 1549-7747

VL - 70

SP - 4266

EP - 4270

JO - IEEE Transactions on Circuits and Systems II: Express Briefs

JF - IEEE Transactions on Circuits and Systems II: Express Briefs

IS - 11

ER -

BMDENet: Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this