PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation

Zhiwei Hao, Zhongyu Xiao, Yong Luo, Jianyuan Guo*, Jing Wang, Li Shen, Han Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The recent advancements in cross-modal transformers have demonstrated their superior performance in RGB-D segmentation tasks by effectively integrating information from both RGB and depth modalities. However, existing methods often overlook the varying levels of informative content present in each modality, treating them equally and using models of the same architecture. This oversight can potentially hinder segmentation performance, especially considering that RGB images typically contain significantly more information than depth images. To address this issue, we propose PrimKD, a knowledge distillation based approach that focuses on guided multimodal fusion, with an emphasis on leveraging the primary RGB modality. In our approach, we utilize a model trained exclusively on the RGB modality as the teacher, guiding the learning process of a student model that fuses both RGB and depth modalities. To prioritize information from the primary RGB modality while leveraging the depth modality, we incorporate primary focused feature reconstruction and a selective alignment scheme. This integration enhances the overall freature fusion, resulting in improved segmentation results. We evaluate our proposed method on the NYU Depth V2 and SUN-RGBD datasets, and the experimental results demonstrate the effectiveness of PrimKD. Specifically, our approach achieves mIoU scores of 57.8 and 52.5 on these two datasets, respectively, surpassing existing counterparts by 1.5 and 0.4 mIoU. The code is available at https://github.com/xiaoshideta/PrimKD.

Original languageEnglish
Title of host publicationMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages1943-1951
Number of pages9
ISBN (Electronic)9798400706868
DOIs
Publication statusPublished - 28 Oct 2024
Event32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024

Publication series

NameMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference32nd ACM International Conference on Multimedia, MM 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24

Keywords

  • knowledge distillation
  • multimodal fusion
  • rgb-d segmentation

Fingerprint

Dive into the research topics of 'PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation'. Together they form a unique fingerprint.

Cite this