TY - JOUR
T1 - Two-Stage Representation Refinement Based on Convex Combination for 3-D Human Poses Estimation
AU - Chen, Luefeng
AU - Cao, Wei
AU - Zheng, Biao
AU - Wu, Min
AU - Pedrycz, Witold
AU - Hirota, Kaoru
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.
AB - In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.
KW - 3-D pose estimation
KW - kinematic constraints
KW - pose sequence
UR - http://www.scopus.com/inward/record.url?scp=85199529009&partnerID=8YFLogxK
U2 - 10.1109/TAI.2024.3432028
DO - 10.1109/TAI.2024.3432028
M3 - Article
AN - SCOPUS:85199529009
SN - 2691-4581
VL - 5
SP - 6500
EP - 6508
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 12
ER -