A deep Coarse-to-Fine network for head pose estimation from synthetic data

Yujia Wang; Wei Liang; Jianbing Shen; Yunde Jia; Lap Fai Yu

doi:10.1016/j.patcog.2019.05.026

A deep Coarse-to-Fine network for head pose estimation from synthetic data

Yujia Wang, Wei Liang^*, Jianbing Shen, Yunde Jia, Lap Fai Yu

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

69 引用（Scopus）

摘要

Various applications of human-computer interaction are based on the estimation of head pose, which is challenging due to different facial appearance, inhomogeneous illumination, partial occlusion, etc. In this paper, we propose a deep neural network following the Coarse-to-Fine strategy to estimate head poses. The scheme includes two branches: Coarse classification phase classifying the input image into four categories, and Fine Regression phase estimating the accurate pose parameters. The two sub-networks are trained jointly. To tackle the problem of insufficient annotated data in training process, we design a rendering pipeline to synthesize realistic head images and generate an annotated dataset with a collection of 310k head poses. The results on benchmark datasets and synthetic dataset validate the effectiveness of our approach, as well as the results on images with diverse illumination, occlusion, and motion blur. Moreover, our method can be easily extended to estimate head poses on depth images.

源语言	英语
页（从-至）	196-206
页数	11
期刊	Pattern Recognition
卷	94
DOI	https://doi.org/10.1016/j.patcog.2019.05.026
出版状态	已出版 - 10月 2019

访问文件

10.1016/j.patcog.2019.05.026

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9fc8647eb70a45dfae85aaaf4c20c58f,

title = "A deep Coarse-to-Fine network for head pose estimation from synthetic data",

abstract = "Various applications of human-computer interaction are based on the estimation of head pose, which is challenging due to different facial appearance, inhomogeneous illumination, partial occlusion, etc. In this paper, we propose a deep neural network following the Coarse-to-Fine strategy to estimate head poses. The scheme includes two branches: Coarse classification phase classifying the input image into four categories, and Fine Regression phase estimating the accurate pose parameters. The two sub-networks are trained jointly. To tackle the problem of insufficient annotated data in training process, we design a rendering pipeline to synthesize realistic head images and generate an annotated dataset with a collection of 310k head poses. The results on benchmark datasets and synthetic dataset validate the effectiveness of our approach, as well as the results on images with diverse illumination, occlusion, and motion blur. Moreover, our method can be easily extended to estimate head poses on depth images.",

keywords = "Coarse-to-Fine, Head pose estimation, Joint learning",

author = "Yujia Wang and Wei Liang and Jianbing Shen and Yunde Jia and Yu, {Lap Fai}",

note = "Publisher Copyright: {\textcopyright} 2019 Elsevier Ltd",

year = "2019",

month = oct,

doi = "10.1016/j.patcog.2019.05.026",

language = "English",

volume = "94",

pages = "196--206",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - A deep Coarse-to-Fine network for head pose estimation from synthetic data

AU - Wang, Yujia

AU - Liang, Wei

AU - Shen, Jianbing

AU - Jia, Yunde

AU - Yu, Lap Fai

PY - 2019/10

Y1 - 2019/10

N2 - Various applications of human-computer interaction are based on the estimation of head pose, which is challenging due to different facial appearance, inhomogeneous illumination, partial occlusion, etc. In this paper, we propose a deep neural network following the Coarse-to-Fine strategy to estimate head poses. The scheme includes two branches: Coarse classification phase classifying the input image into four categories, and Fine Regression phase estimating the accurate pose parameters. The two sub-networks are trained jointly. To tackle the problem of insufficient annotated data in training process, we design a rendering pipeline to synthesize realistic head images and generate an annotated dataset with a collection of 310k head poses. The results on benchmark datasets and synthetic dataset validate the effectiveness of our approach, as well as the results on images with diverse illumination, occlusion, and motion blur. Moreover, our method can be easily extended to estimate head poses on depth images.

AB - Various applications of human-computer interaction are based on the estimation of head pose, which is challenging due to different facial appearance, inhomogeneous illumination, partial occlusion, etc. In this paper, we propose a deep neural network following the Coarse-to-Fine strategy to estimate head poses. The scheme includes two branches: Coarse classification phase classifying the input image into four categories, and Fine Regression phase estimating the accurate pose parameters. The two sub-networks are trained jointly. To tackle the problem of insufficient annotated data in training process, we design a rendering pipeline to synthesize realistic head images and generate an annotated dataset with a collection of 310k head poses. The results on benchmark datasets and synthetic dataset validate the effectiveness of our approach, as well as the results on images with diverse illumination, occlusion, and motion blur. Moreover, our method can be easily extended to estimate head poses on depth images.

KW - Coarse-to-Fine

KW - Head pose estimation

KW - Joint learning

UR - http://www.scopus.com/inward/record.url?scp=85066290070&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2019.05.026

DO - 10.1016/j.patcog.2019.05.026

M3 - Article

AN - SCOPUS:85066290070

SN - 0031-3203

VL - 94

SP - 196

EP - 206

JO - Pattern Recognition

JF - Pattern Recognition

ER -

A deep Coarse-to-Fine network for head pose estimation from synthetic data

摘要

访问文件

其它文件与链接

指纹

引用此