Effectiveness Guided Cross-Modal Information Sharing for Aligned RGB-T Object Detection

Zijia An, Chunlei Liu, Yuqi Han*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)

Abstract

Integrating multi-modal data can significantly increase detection performance in a complex scene by introducing additional targets' information. However, most of the existing multi-modal detectors separately extract the features from the respective modalities without regarding the correlation between the modalities. Considering the spatial correlation across different modalities for aligned multi-modal data, we attempt to exploit such correlation to share target's information across different modalities, thereby enhancing the targets' feature representation capability. To this end, in this letter, we propose an Effectiveness Guided Cross-Modal Information Sharing Network (ECISNet) for aligned multi-modal data, which can still accurately detect objects when a modality fails. Specifically, the Cross-Modal Information Sharing (CIS) module is proposed to enhance the feature extraction capability by sharing information about targets across different modalities. Afterward, considering that the failed modality may interfere with other modalities when sharing information, we designed a Modal Effectiveness Guiding (MEG) module that guides the CIS module to exclude the interference of failed modalities. Extensive experiments on three latest multi-modal detection datasets demonstrate that ECISNet outperforms relevant state-of-the-art detection algorithms.

Original languageEnglish
Pages (from-to)2562-2566
Number of pages5
JournalIEEE Signal Processing Letters
Volume29
DOIs
Publication statusPublished - 2022
Externally publishedYes

Keywords

  • Aligned RGB-T object detection
  • cross-modal learning
  • modal effectiveness guiding

Fingerprint

Dive into the research topics of 'Effectiveness Guided Cross-Modal Information Sharing for Aligned RGB-T Object Detection'. Together they form a unique fingerprint.

Cite this