TY - JOUR
T1 - Effectiveness Guided Cross-Modal Information Sharing for Aligned RGB-T Object Detection
AU - An, Zijia
AU - Liu, Chunlei
AU - Han, Yuqi
N1 - Publisher Copyright:
© 1994-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Integrating multi-modal data can significantly increase detection performance in a complex scene by introducing additional targets' information. However, most of the existing multi-modal detectors separately extract the features from the respective modalities without regarding the correlation between the modalities. Considering the spatial correlation across different modalities for aligned multi-modal data, we attempt to exploit such correlation to share target's information across different modalities, thereby enhancing the targets' feature representation capability. To this end, in this letter, we propose an Effectiveness Guided Cross-Modal Information Sharing Network (ECISNet) for aligned multi-modal data, which can still accurately detect objects when a modality fails. Specifically, the Cross-Modal Information Sharing (CIS) module is proposed to enhance the feature extraction capability by sharing information about targets across different modalities. Afterward, considering that the failed modality may interfere with other modalities when sharing information, we designed a Modal Effectiveness Guiding (MEG) module that guides the CIS module to exclude the interference of failed modalities. Extensive experiments on three latest multi-modal detection datasets demonstrate that ECISNet outperforms relevant state-of-the-art detection algorithms.
AB - Integrating multi-modal data can significantly increase detection performance in a complex scene by introducing additional targets' information. However, most of the existing multi-modal detectors separately extract the features from the respective modalities without regarding the correlation between the modalities. Considering the spatial correlation across different modalities for aligned multi-modal data, we attempt to exploit such correlation to share target's information across different modalities, thereby enhancing the targets' feature representation capability. To this end, in this letter, we propose an Effectiveness Guided Cross-Modal Information Sharing Network (ECISNet) for aligned multi-modal data, which can still accurately detect objects when a modality fails. Specifically, the Cross-Modal Information Sharing (CIS) module is proposed to enhance the feature extraction capability by sharing information about targets across different modalities. Afterward, considering that the failed modality may interfere with other modalities when sharing information, we designed a Modal Effectiveness Guiding (MEG) module that guides the CIS module to exclude the interference of failed modalities. Extensive experiments on three latest multi-modal detection datasets demonstrate that ECISNet outperforms relevant state-of-the-art detection algorithms.
KW - Aligned RGB-T object detection
KW - cross-modal learning
KW - modal effectiveness guiding
UR - http://www.scopus.com/inward/record.url?scp=85144804209&partnerID=8YFLogxK
U2 - 10.1109/LSP.2022.3229571
DO - 10.1109/LSP.2022.3229571
M3 - Article
AN - SCOPUS:85144804209
SN - 1070-9908
VL - 29
SP - 2562
EP - 2566
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
ER -