Human parsing by weak structural label

Zhiyong Chen; Si Liu; Yanlong Zhai; Jia Lin; Xiaochun Cao; Liang Yang

doi:10.1007/s11042-017-5368-4

Human parsing by weak structural label

Zhiyong Chen, Si Liu, Yanlong Zhai^*, Jia Lin, Xiaochun Cao, Liang Yang

^*此作品的通讯作者

网络空间安全学院

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are available during the training process, which require tedious human labeling efforts. In this paper, we propose a weakly supervised deep parsing method to alleviate the human from the time-consuming labeling. More specifically, we resort to train a robust human parser with the structural image-level labels, e.g., “red jeans” etc. The structural label contains an attribute, e.g., “red”, as well as a class label, e.g., “jeans”. Our framework is based on the Fully Convolution Network (FCN) (Pathak et al. 2014) with two critical differences. First, the loss function defined on the pixel by FCN (Pathak et al. 2014) is modified to the image-level loss by aggregating the pixel-wise prediction of the whole image into a multiple instance learning manner. Besides, we develop a novel logistic pooling layer to constrain that the pixels responding to the color and corresponding category labels are the same to interpret the structural label. Extensive experiments in the publicly available dataset (Liu et al. IEEE Trans Multimedia 16(1):253–265, 2014) show the effectiveness of the proposed method.

源语言	英语
页（从-至）	19795-19809
页数	15
期刊	Multimedia Tools and Applications
卷	77
期	15
DOI	https://doi.org/10.1007/s11042-017-5368-4
出版状态	已出版 - 1 8月 2018

访问文件

10.1007/s11042-017-5368-4

其它文件与链接

链接到 Scopus 的出版物

引用此

Chen, Z., Liu, S., Zhai, Y., Lin, J., Cao, X., & Yang, L. (2018). Human parsing by weak structural label. Multimedia Tools and Applications, 77(15), 19795-19809. https://doi.org/10.1007/s11042-017-5368-4

@article{5399047a8b37405a9d73c4eea6728a4b,

title = "Human parsing by weak structural label",

abstract = "Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are available during the training process, which require tedious human labeling efforts. In this paper, we propose a weakly supervised deep parsing method to alleviate the human from the time-consuming labeling. More specifically, we resort to train a robust human parser with the structural image-level labels, e.g., “red jeans” etc. The structural label contains an attribute, e.g., “red”, as well as a class label, e.g., “jeans”. Our framework is based on the Fully Convolution Network (FCN) (Pathak et al. 2014) with two critical differences. First, the loss function defined on the pixel by FCN (Pathak et al. 2014) is modified to the image-level loss by aggregating the pixel-wise prediction of the whole image into a multiple instance learning manner. Besides, we develop a novel logistic pooling layer to constrain that the pixels responding to the color and corresponding category labels are the same to interpret the structural label. Extensive experiments in the publicly available dataset (Liu et al. IEEE Trans Multimedia 16(1):253–265, 2014) show the effectiveness of the proposed method.",

keywords = "Deep learning, Human parsing",

author = "Zhiyong Chen and Si Liu and Yanlong Zhai and Jia Lin and Xiaochun Cao and Liang Yang",

note = "Publisher Copyright: {\textcopyright} 2017, Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2018",

month = aug,

day = "1",

doi = "10.1007/s11042-017-5368-4",

language = "English",

volume = "77",

pages = "19795--19809",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "15",

}

TY - JOUR

T1 - Human parsing by weak structural label

AU - Chen, Zhiyong

AU - Liu, Si

AU - Zhai, Yanlong

AU - Lin, Jia

AU - Cao, Xiaochun

AU - Yang, Liang

PY - 2018/8/1

Y1 - 2018/8/1

N2 - Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are available during the training process, which require tedious human labeling efforts. In this paper, we propose a weakly supervised deep parsing method to alleviate the human from the time-consuming labeling. More specifically, we resort to train a robust human parser with the structural image-level labels, e.g., “red jeans” etc. The structural label contains an attribute, e.g., “red”, as well as a class label, e.g., “jeans”. Our framework is based on the Fully Convolution Network (FCN) (Pathak et al. 2014) with two critical differences. First, the loss function defined on the pixel by FCN (Pathak et al. 2014) is modified to the image-level loss by aggregating the pixel-wise prediction of the whole image into a multiple instance learning manner. Besides, we develop a novel logistic pooling layer to constrain that the pixels responding to the color and corresponding category labels are the same to interpret the structural label. Extensive experiments in the publicly available dataset (Liu et al. IEEE Trans Multimedia 16(1):253–265, 2014) show the effectiveness of the proposed method.

AB - Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are available during the training process, which require tedious human labeling efforts. In this paper, we propose a weakly supervised deep parsing method to alleviate the human from the time-consuming labeling. More specifically, we resort to train a robust human parser with the structural image-level labels, e.g., “red jeans” etc. The structural label contains an attribute, e.g., “red”, as well as a class label, e.g., “jeans”. Our framework is based on the Fully Convolution Network (FCN) (Pathak et al. 2014) with two critical differences. First, the loss function defined on the pixel by FCN (Pathak et al. 2014) is modified to the image-level loss by aggregating the pixel-wise prediction of the whole image into a multiple instance learning manner. Besides, we develop a novel logistic pooling layer to constrain that the pixels responding to the color and corresponding category labels are the same to interpret the structural label. Extensive experiments in the publicly available dataset (Liu et al. IEEE Trans Multimedia 16(1):253–265, 2014) show the effectiveness of the proposed method.

KW - Deep learning

KW - Human parsing

UR - http://www.scopus.com/inward/record.url?scp=85035121395&partnerID=8YFLogxK

U2 - 10.1007/s11042-017-5368-4

DO - 10.1007/s11042-017-5368-4

M3 - Article

AN - SCOPUS:85035121395

SN - 1380-7501

VL - 77

SP - 19795

EP - 19809

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 15

ER -

Human parsing by weak structural label

摘要

访问文件

其它文件与链接

指纹

引用此