TY - JOUR
T1 - FingerPoseNet
T2 - A finger-level multitask learning network with residual feature sharing for 3D hand pose estimation
AU - Tewolde, Tekie Tsegay
AU - Manjotho, Ali Asghar
AU - Sarker, Prodip Kumar
AU - Niu, Zhendong
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/7
Y1 - 2025/7
N2 - Hand pose estimation approaches commonly rely on shared hand feature maps to regress the 3D locations of all hand joints. Subsequently, they struggle to enhance finger-level features which are invaluable in capturing joint-to-finger associations and articulations. To address this limitation, we propose a finger-level multitask learning network with residual feature sharing, named FingerPoseNet, for accurate 3D hand pose estimation from a depth image. FingerPoseNet comprises three stages: (a) a shared base feature map extraction backbone based on pre-trained ResNet-50; (b) a finger-level multitask learning stage that extracts and enhances feature maps for each finger and the palm; and (c) a multitask fusion layer for consolidating the estimation results obtained by each subtask. We exploit multitask learning by decoupling the hand pose estimation task into six subtasks dedicated to each finger and palm. Each subtask is responsible for subtask-specific feature extraction, enhancement, and 3D keypoint regression. To enhance subtask-specific features, we propose a residual feature-sharing approach scaled up to mine supplementary information from all subtasks. Experiments performed on five challenging public hand pose datasets, including ICVL, NYU, MSRA, Hands-2019-Task1, and HO3D-v3 demonstrate significant improvements in accuracy compared with state-of-the-art approaches.
AB - Hand pose estimation approaches commonly rely on shared hand feature maps to regress the 3D locations of all hand joints. Subsequently, they struggle to enhance finger-level features which are invaluable in capturing joint-to-finger associations and articulations. To address this limitation, we propose a finger-level multitask learning network with residual feature sharing, named FingerPoseNet, for accurate 3D hand pose estimation from a depth image. FingerPoseNet comprises three stages: (a) a shared base feature map extraction backbone based on pre-trained ResNet-50; (b) a finger-level multitask learning stage that extracts and enhances feature maps for each finger and the palm; and (c) a multitask fusion layer for consolidating the estimation results obtained by each subtask. We exploit multitask learning by decoupling the hand pose estimation task into six subtasks dedicated to each finger and palm. Each subtask is responsible for subtask-specific feature extraction, enhancement, and 3D keypoint regression. To enhance subtask-specific features, we propose a residual feature-sharing approach scaled up to mine supplementary information from all subtasks. Experiments performed on five challenging public hand pose datasets, including ICVL, NYU, MSRA, Hands-2019-Task1, and HO3D-v3 demonstrate significant improvements in accuracy compared with state-of-the-art approaches.
KW - Hand pose estimation
KW - Information sharing
KW - Multitask learning
KW - User behavior modeling
KW - Virtual reality
UR - http://www.scopus.com/inward/record.url?scp=86000527195&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2025.107315
DO - 10.1016/j.neunet.2025.107315
M3 - Article
C2 - 40081269
AN - SCOPUS:86000527195
SN - 0893-6080
VL - 187
JO - Neural Networks
JF - Neural Networks
M1 - 107315
ER -