Heterogeneous hashing network for face retrieval across image and video domains

Chenchen Jing; Zhen Dong; Mingtao Pei; Yunde Jia

doi:10.1109/TMM.2018.2866222

Heterogeneous hashing network for face retrieval across image and video domains

Chenchen Jing, Zhen Dong, Mingtao Pei^*, Yunde Jia

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

25 引用（Scopus）

摘要

In this paper, we present a heterogeneous hashing network to generate effective and compact hash representations of both face images and face videos for face retrieval across image and video domains. The network contains an image branch and a video branch to project face images and videos into a common space, respectively. Then, the non-linear hash functions are learned in the common space to obtain the corresponding binary hash representations. The network is trained with three loss functions: 1) the Fisher loss; 2) the softmax loss; and 3) the triplet ranking loss. The Fisher loss uses the difference form of within-class and between-class scatter and is appropriate for the mini-batch-based optimization method. The Fisher loss together with the softmax loss is exploited to enhance the discriminative power of the common space. The triplet ranking loss is enforced on the final binary hash representations to improve retrieval performance. Experiments on a large-scale face video dataset and two challenging TV-series datasets demonstrate the effectiveness of the proposed method.

源语言	英语
文章编号	8440769
页（从-至）	782-794
页数	13
期刊	IEEE Transactions on Multimedia
卷	21
期	3
DOI	https://doi.org/10.1109/TMM.2018.2866222
出版状态	已出版 - 3月 2019

访问文件

10.1109/TMM.2018.2866222

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{32c599bdc0ea4943bc1b4002231dbf10,

title = "Heterogeneous hashing network for face retrieval across image and video domains",

abstract = "In this paper, we present a heterogeneous hashing network to generate effective and compact hash representations of both face images and face videos for face retrieval across image and video domains. The network contains an image branch and a video branch to project face images and videos into a common space, respectively. Then, the non-linear hash functions are learned in the common space to obtain the corresponding binary hash representations. The network is trained with three loss functions: 1) the Fisher loss; 2) the softmax loss; and 3) the triplet ranking loss. The Fisher loss uses the difference form of within-class and between-class scatter and is appropriate for the mini-batch-based optimization method. The Fisher loss together with the softmax loss is exploited to enhance the discriminative power of the common space. The triplet ranking loss is enforced on the final binary hash representations to improve retrieval performance. Experiments on a large-scale face video dataset and two challenging TV-series datasets demonstrate the effectiveness of the proposed method.",

keywords = "Deep CNN, Face retrieval, Hash learning, Image and video domains",

author = "Chenchen Jing and Zhen Dong and Mingtao Pei and Yunde Jia",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.",

year = "2019",

month = mar,

doi = "10.1109/TMM.2018.2866222",

language = "English",

volume = "21",

pages = "782--794",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - Heterogeneous hashing network for face retrieval across image and video domains

AU - Jing, Chenchen

AU - Dong, Zhen

AU - Pei, Mingtao

AU - Jia, Yunde

PY - 2019/3

Y1 - 2019/3

N2 - In this paper, we present a heterogeneous hashing network to generate effective and compact hash representations of both face images and face videos for face retrieval across image and video domains. The network contains an image branch and a video branch to project face images and videos into a common space, respectively. Then, the non-linear hash functions are learned in the common space to obtain the corresponding binary hash representations. The network is trained with three loss functions: 1) the Fisher loss; 2) the softmax loss; and 3) the triplet ranking loss. The Fisher loss uses the difference form of within-class and between-class scatter and is appropriate for the mini-batch-based optimization method. The Fisher loss together with the softmax loss is exploited to enhance the discriminative power of the common space. The triplet ranking loss is enforced on the final binary hash representations to improve retrieval performance. Experiments on a large-scale face video dataset and two challenging TV-series datasets demonstrate the effectiveness of the proposed method.

AB - In this paper, we present a heterogeneous hashing network to generate effective and compact hash representations of both face images and face videos for face retrieval across image and video domains. The network contains an image branch and a video branch to project face images and videos into a common space, respectively. Then, the non-linear hash functions are learned in the common space to obtain the corresponding binary hash representations. The network is trained with three loss functions: 1) the Fisher loss; 2) the softmax loss; and 3) the triplet ranking loss. The Fisher loss uses the difference form of within-class and between-class scatter and is appropriate for the mini-batch-based optimization method. The Fisher loss together with the softmax loss is exploited to enhance the discriminative power of the common space. The triplet ranking loss is enforced on the final binary hash representations to improve retrieval performance. Experiments on a large-scale face video dataset and two challenging TV-series datasets demonstrate the effectiveness of the proposed method.

KW - Deep CNN

KW - Face retrieval

KW - Hash learning

KW - Image and video domains

UR - http://www.scopus.com/inward/record.url?scp=85051809816&partnerID=8YFLogxK

U2 - 10.1109/TMM.2018.2866222

DO - 10.1109/TMM.2018.2866222

M3 - Article

AN - SCOPUS:85051809816

SN - 1520-9210

VL - 21

SP - 782

EP - 794

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

IS - 3

M1 - 8440769

ER -

Heterogeneous hashing network for face retrieval across image and video domains

摘要

访问文件

其它文件与链接

指纹

引用此