Deep CNN based binary hash video representations for face retrieval

Zhen Dong; Chenchen Jing; Mingtao Pei; Yunde Jia

doi:10.1016/j.patcog.2018.04.014

Deep CNN based binary hash video representations for face retrieval

Zhen Dong, Chenchen Jing, Mingtao Pei^*, Yunde Jia

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

30 引用（Scopus）

摘要

In this paper, a novel deep convolutional neural network is proposed to learn discriminative binary hash video representations for face retrieval. The network integrates face feature extractor and hash functions into a unified optimization framework to make the two components be as compatible as possible. In order to achieve better initializations for the optimization, the low-rank discriminative binary hashing method is introduced to pre-learn the hash functions of the network during the training procedure. The input to the network is a face frame, and the output is the corresponding binary hash frame representation. Frame representations of a face video shot are fused by hard voting to generate the binary hash video representation. Each bit in the binary representation of frame/video describes the presence or absence of a face attribute, which makes it possible to retrieve faces among both the image and video domains. Extensive experiments are conducted on two challenging TV-Series datasets, and the excellent performance demonstrates the effectiveness of the proposed network.

源语言	英语
页（从-至）	357-369
页数	13
期刊	Pattern Recognition
卷	81
DOI	https://doi.org/10.1016/j.patcog.2018.04.014
出版状态	已出版 - 9月 2018

访问文件

10.1016/j.patcog.2018.04.014

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{77091c0a834d4f1e9fe21ecb8af6e51c,

title = "Deep CNN based binary hash video representations for face retrieval",

abstract = "In this paper, a novel deep convolutional neural network is proposed to learn discriminative binary hash video representations for face retrieval. The network integrates face feature extractor and hash functions into a unified optimization framework to make the two components be as compatible as possible. In order to achieve better initializations for the optimization, the low-rank discriminative binary hashing method is introduced to pre-learn the hash functions of the network during the training procedure. The input to the network is a face frame, and the output is the corresponding binary hash frame representation. Frame representations of a face video shot are fused by hard voting to generate the binary hash video representation. Each bit in the binary representation of frame/video describes the presence or absence of a face attribute, which makes it possible to retrieve faces among both the image and video domains. Extensive experiments are conducted on two challenging TV-Series datasets, and the excellent performance demonstrates the effectiveness of the proposed network.",

keywords = "Cross-domain face retrieval, Deep CNN, Face video retrieval, Hash learning",

author = "Zhen Dong and Chenchen Jing and Mingtao Pei and Yunde Jia",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier Ltd",

year = "2018",

month = sep,

doi = "10.1016/j.patcog.2018.04.014",

language = "English",

volume = "81",

pages = "357--369",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Deep CNN based binary hash video representations for face retrieval

AU - Dong, Zhen

AU - Jing, Chenchen

AU - Pei, Mingtao

AU - Jia, Yunde

PY - 2018/9

Y1 - 2018/9

N2 - In this paper, a novel deep convolutional neural network is proposed to learn discriminative binary hash video representations for face retrieval. The network integrates face feature extractor and hash functions into a unified optimization framework to make the two components be as compatible as possible. In order to achieve better initializations for the optimization, the low-rank discriminative binary hashing method is introduced to pre-learn the hash functions of the network during the training procedure. The input to the network is a face frame, and the output is the corresponding binary hash frame representation. Frame representations of a face video shot are fused by hard voting to generate the binary hash video representation. Each bit in the binary representation of frame/video describes the presence or absence of a face attribute, which makes it possible to retrieve faces among both the image and video domains. Extensive experiments are conducted on two challenging TV-Series datasets, and the excellent performance demonstrates the effectiveness of the proposed network.

AB - In this paper, a novel deep convolutional neural network is proposed to learn discriminative binary hash video representations for face retrieval. The network integrates face feature extractor and hash functions into a unified optimization framework to make the two components be as compatible as possible. In order to achieve better initializations for the optimization, the low-rank discriminative binary hashing method is introduced to pre-learn the hash functions of the network during the training procedure. The input to the network is a face frame, and the output is the corresponding binary hash frame representation. Frame representations of a face video shot are fused by hard voting to generate the binary hash video representation. Each bit in the binary representation of frame/video describes the presence or absence of a face attribute, which makes it possible to retrieve faces among both the image and video domains. Extensive experiments are conducted on two challenging TV-Series datasets, and the excellent performance demonstrates the effectiveness of the proposed network.

KW - Cross-domain face retrieval

KW - Deep CNN

KW - Face video retrieval

KW - Hash learning

UR - http://www.scopus.com/inward/record.url?scp=85045760970&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2018.04.014

DO - 10.1016/j.patcog.2018.04.014

M3 - Article

AN - SCOPUS:85045760970

SN - 0031-3203

VL - 81

SP - 357

EP - 369

JO - Pattern Recognition

JF - Pattern Recognition

ER -

Deep CNN based binary hash video representations for face retrieval

摘要

访问文件

其它文件与链接

指纹

引用此