TY - JOUR
T1 - Landmark and Pose Prediction in Occluded Facial Point Cloud via Explicit Joint Feature Fusion Network
AU - Yang, Yifei
AU - Fan, Jingfan
AU - Shao, Long
AU - Lei, Mingyang
AU - Fu, Tianyu
AU - Ai, Danni
AU - Xiao, Deqiang
AU - Song, Hong
AU - Lin, Yucong
AU - Yang, Jian
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Facial point clouds collected in practical applications often suffer from pose variations and occlusion. Existing studies typically focus on either pose estimation or landmarks localization, neglecting to fully utilize the effective information from various facial features, thus limiting the improvement of prediction accuracy. Therefore, we propose an innovative 3D facial multi-task prediction network. The proposed network embeds the output of related tasks into feature extraction from the point level to the global level based on the physical dependencies between tasks. This facilitates explicit multi-task knowledge transfer, enabling the simultaneous prediction of facial landmarks, occlusion, and head pose. We introduce a training strategy based on posterior knowledge correction to iteratively refine and improve multi-task prediction results. Moreover, no single dataset provides annotations for all these tasks at once, so we synthesized a 3D landmarks, occlusion and pose (3D-LOP) dataset, which includes annotations for landmarks coordinates, occlusion probability, and head pose. The proposed method was compared with state-of-the-art methods on two public datasets and 3D-LOP. The landmarks localization accuracy improved by 7.1% on the two public datasets, and the pose estimation accuracy and stability on 3D-LOP improved by 28.5% and 32.7%, respectively. The performance on wild data also shows its potential in practical applications.
AB - Facial point clouds collected in practical applications often suffer from pose variations and occlusion. Existing studies typically focus on either pose estimation or landmarks localization, neglecting to fully utilize the effective information from various facial features, thus limiting the improvement of prediction accuracy. Therefore, we propose an innovative 3D facial multi-task prediction network. The proposed network embeds the output of related tasks into feature extraction from the point level to the global level based on the physical dependencies between tasks. This facilitates explicit multi-task knowledge transfer, enabling the simultaneous prediction of facial landmarks, occlusion, and head pose. We introduce a training strategy based on posterior knowledge correction to iteratively refine and improve multi-task prediction results. Moreover, no single dataset provides annotations for all these tasks at once, so we synthesized a 3D landmarks, occlusion and pose (3D-LOP) dataset, which includes annotations for landmarks coordinates, occlusion probability, and head pose. The proposed method was compared with state-of-the-art methods on two public datasets and 3D-LOP. The landmarks localization accuracy improved by 7.1% on the two public datasets, and the pose estimation accuracy and stability on 3D-LOP improved by 28.5% and 32.7%, respectively. The performance on wild data also shows its potential in practical applications.
KW - 3D landmark localization
KW - Explicit feature fusion
KW - Head pose estimation
KW - Occlusion probability prediction
UR - http://www.scopus.com/inward/record.url?scp=105003671282&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2025.3563402
DO - 10.1109/TCSVT.2025.3563402
M3 - Article
AN - SCOPUS:105003671282
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -