TY - JOUR
T1 - Differentiable Multi-Granularity Human Parsing
AU - Zhou, Tianfei
AU - Yang, Yi
AU - Wang, Wenguan
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - In this work, we study the challenging problem of instance-aware human body part parsing. We introduce a new bottom-up regime which achieves the task through learning category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. The output is a compact, efficient and powerful framework that exploits structural information over different human granularities and eases the difficulty of person partitioning. Specifically, a dense-to-sparse projection field, which allows explicitly associating dense human semantics with sparse keypoints, is learnt and progressively improved over the network feature pyramid for robustness. Then, the difficult pixel grouping problem is cast as an easier, multi-person joint assembling task. By formulating joint association as maximum-weight bipartite matching, we develop two novel algorithms based on projected gradient descent and unbalanced optimal transport, respectively, to solve the matching problem differentiablly. These algorithms make our method end-to-end trainable and allow back-propagating the grouping error to directly supervise multi-granularity human representation learning. This is significantly distinguished from current bottom-up human parsers or pose estimators which require sophisticated post-processing or heuristic greedy algorithms. Extensive experiments on three instance-aware human parsing datasets (i.e., MHP-v2, DensePose-COCO, PASCAL-Person-Part) demonstrate that our approach outperforms most existing human parsers with much more efficient inference. Our code is available at https://github.com/tfzhou/MG-HumanParsing.
AB - In this work, we study the challenging problem of instance-aware human body part parsing. We introduce a new bottom-up regime which achieves the task through learning category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. The output is a compact, efficient and powerful framework that exploits structural information over different human granularities and eases the difficulty of person partitioning. Specifically, a dense-to-sparse projection field, which allows explicitly associating dense human semantics with sparse keypoints, is learnt and progressively improved over the network feature pyramid for robustness. Then, the difficult pixel grouping problem is cast as an easier, multi-person joint assembling task. By formulating joint association as maximum-weight bipartite matching, we develop two novel algorithms based on projected gradient descent and unbalanced optimal transport, respectively, to solve the matching problem differentiablly. These algorithms make our method end-to-end trainable and allow back-propagating the grouping error to directly supervise multi-granularity human representation learning. This is significantly distinguished from current bottom-up human parsers or pose estimators which require sophisticated post-processing or heuristic greedy algorithms. Extensive experiments on three instance-aware human parsing datasets (i.e., MHP-v2, DensePose-COCO, PASCAL-Person-Part) demonstrate that our approach outperforms most existing human parsers with much more efficient inference. Our code is available at https://github.com/tfzhou/MG-HumanParsing.
KW - Instance-aware human semantic parsing
KW - multi-granularity human representation
KW - multi-person pose estimation
UR - http://www.scopus.com/inward/record.url?scp=85148466195&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2023.3239194
DO - 10.1109/TPAMI.2023.3239194
M3 - Article
C2 - 37022259
AN - SCOPUS:85148466195
SN - 0162-8828
VL - 45
SP - 8296
EP - 8310
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 7
ER -