TY - JOUR
T1 - Alleviating Modality Bias Training for Infrared-Visible Person Re-Identification
AU - Huang, Yan
AU - Wu, Qiang
AU - Xu, Jingsong
AU - Zhong, Yi
AU - Zhang, Peng
AU - Zhang, Zhaoxiang
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - The task of infrared-visible person re-identification (IV-reID) is to recognize people across two modalities (i.e., RGB and IR). Existing cutting-edge approaches normally use a pair of images that have the same IDs (i.e., ID-tied cross-modality image pairs) and input them into an ImageNet-trained ResNet50. The ResNet50 backbone model can learn shared features across modalities to tolerate modality discrepancies between RGB and IR. This work will unveil a Modality Bias Training (MBT) problem that is less discussed in IV-reID, which will demonstrate that MBT significantly compromises the performance of IV-reID. Due to MBT, IR information can be overwhelmed by RGB information during training when the ResNet50 model is pretrained based on a large amount of RGB images from ImageNet. Thus, the trained models are more inclined to RGB information. Accordingly, the cross-modality generalization ability of the model is also compromised. To tackle this issue, we present a Dual-level Learning Strategy (DLS) that 1) enforces the focus of the network on ID-exclusive (rather than ID-tied) labels of cross-modality image pairs to mitigate the problem of MBT and 2) introduces third modality data that contain both RGB and IR information to further prevent the information from the IR modality from being overwhelmed during training. Our third modality images are generated by a generative adversarial network. A dynamic ID-exclusive Smooth (dIDeS) label is proposed for the generated third modality data. In experiments, comprehensive experiments are carried out to demonstrate the success of DLS in tackling the MBT issue exposed in IV-reID.
AB - The task of infrared-visible person re-identification (IV-reID) is to recognize people across two modalities (i.e., RGB and IR). Existing cutting-edge approaches normally use a pair of images that have the same IDs (i.e., ID-tied cross-modality image pairs) and input them into an ImageNet-trained ResNet50. The ResNet50 backbone model can learn shared features across modalities to tolerate modality discrepancies between RGB and IR. This work will unveil a Modality Bias Training (MBT) problem that is less discussed in IV-reID, which will demonstrate that MBT significantly compromises the performance of IV-reID. Due to MBT, IR information can be overwhelmed by RGB information during training when the ResNet50 model is pretrained based on a large amount of RGB images from ImageNet. Thus, the trained models are more inclined to RGB information. Accordingly, the cross-modality generalization ability of the model is also compromised. To tackle this issue, we present a Dual-level Learning Strategy (DLS) that 1) enforces the focus of the network on ID-exclusive (rather than ID-tied) labels of cross-modality image pairs to mitigate the problem of MBT and 2) introduces third modality data that contain both RGB and IR information to further prevent the information from the IR modality from being overwhelmed during training. Our third modality images are generated by a generative adversarial network. A dynamic ID-exclusive Smooth (dIDeS) label is proposed for the generated third modality data. In experiments, comprehensive experiments are carried out to demonstrate the success of DLS in tackling the MBT issue exposed in IV-reID.
KW - Cross modality
KW - modality bias training
KW - person re-identification
UR - http://www.scopus.com/inward/record.url?scp=85103277513&partnerID=8YFLogxK
U2 - 10.1109/TMM.2021.3067760
DO - 10.1109/TMM.2021.3067760
M3 - Article
AN - SCOPUS:85103277513
SN - 1520-9210
VL - 24
SP - 1570
EP - 1582
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -