Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

Peng Dai; Xue Wang; Weihang Zhang; Pengbo Zhang; Wei You

doi:10.1007/s11042-018-5684-3

Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

Peng Dai, Xue Wang^*, Weihang Zhang, Pengbo Zhang, Wei You

^*Corresponding author for this work

Tsinghua University

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.

Original language	English
Pages (from-to)	23547-23577
Number of pages	31
Journal	Multimedia Tools and Applications
Volume	77
Issue number	18
DOIs	https://doi.org/10.1007/s11042-018-5684-3
Publication status	Published - 1 Sept 2018
Externally published	Yes

Keywords

Cross-modality similarity search
Face image-video retrieval
Hashing
Human attribute

Access to Document

10.1007/s11042-018-5684-3

Cite this

@article{ab42923c0558497d97746e161c26426d,

title = "Implicit relative attribute enabled cross-modality hashing for face image-video retrieval",

abstract = "Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.",

keywords = "Cross-modality similarity search, Face image-video retrieval, Hashing, Human attribute",

author = "Peng Dai and Xue Wang and Weihang Zhang and Pengbo Zhang and Wei You",

note = "Publisher Copyright: {\textcopyright} 2018, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2018",

month = sep,

day = "1",

doi = "10.1007/s11042-018-5684-3",

language = "English",

volume = "77",

pages = "23547--23577",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "18",

}

TY - JOUR

T1 - Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

AU - Dai, Peng

AU - Wang, Xue

AU - Zhang, Weihang

AU - Zhang, Pengbo

AU - You, Wei

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.

AB - Face image-video retrieval refers to retrieving videos of a specific person with image query or searching face images of one person by using a video clip query. It has attracted much attention for broad applications like suspect tracking and identifying. This paper proposes a novel implicit relative attribute enabled cross-modality hashing (IRAH) method for large-scale face image-video retrieval. To cope with large-scale data, the proposed IRAH method facilitates fast cross-modality retrieval through embedding two entirely heterogeneous spaces, i.e., face images in Euclidean space and face videos on a Riemannian manifold, into a unified compact Hamming space. In order to resolve the semantic gap, IRAH maps the original low-level kernelized features to discriminative high-level implicit relative attributes. Therefore, the retrieval accuracy can be improved by leveraging both the label information across different modalities and the semantic structure obtained from the implicit relative attributes in each modality. To evaluate the proposed method, we conduct extensive experiments on two publicly available databases, i.e., the Big Bang Theory (BBT) and Buffy the Vampire Slayer (BVS). The experimental results demonstrate the superiority of the proposed method over different state-of-the-art cross-modality hashing methods. The performance gains are especially significant in the case that the hash code length is 8 bits, up to 12% improvements over the second best method among tested methods.

KW - Cross-modality similarity search

KW - Face image-video retrieval

KW - Hashing

KW - Human attribute

UR - http://www.scopus.com/inward/record.url?scp=85051651656&partnerID=8YFLogxK

U2 - 10.1007/s11042-018-5684-3

DO - 10.1007/s11042-018-5684-3

M3 - Article

AN - SCOPUS:85051651656

SN - 1380-7501

VL - 77

SP - 23547

EP - 23577

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 18

ER -

Implicit relative attribute enabled cross-modality hashing for face image-video retrieval

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this