TY - JOUR
T1 - Implicit relative attribute enabled cross-modality hashing for face image-video retrieval
AU - Dai, Peng
AU - Wang, Xue
AU - Zhang, Weihang
AU - Zhang, Pengbo
AU - You, Wei
N1 - Publisher Copyright:
© 2018, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.
AB - Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.
KW - Cross-modality similarity search
KW - Face image-video retrieval
KW - Hashing
KW - Human attribute
UR - http://www.scopus.com/inward/record.url?scp=85051651656&partnerID=8YFLogxK
U2 - 10.1007/s11042-018-5684-3
DO - 10.1007/s11042-018-5684-3
M3 - Article
AN - SCOPUS:85051651656
SN - 1380-7501
VL - 77
SP - 23547
EP - 23577
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 18
ER -