Enhancing human pose estimation accuracy with pyramid fusion Vision Transformers

Research output: Contribution to journalArticlepeer-review

Abstract

Human pose estimation, a pivotal task in computer vision, finds applications in diverse fields such as action recognition, human–computer interaction, and healthcare. Despite recent advances leveraging powerful backbone networks, challenges remain in fully exploiting structural priors and spatial dependencies. This paper introduces PyraFusionPose, a novel pose estimation framework featuring a task-specific backbone architecture and an uncertainty-aware regression head. Our approach integrates both fine-grained local details and global contextual information, enhancing accuracy and robustness. Extensive experiments on the COCO benchmark demonstrate a 0.6% improvement in AP over the state-of-the-art model, ViTPose, with superior performance in complex scenarios such as crowded scenes and occlusions. These findings establish a new standard for human pose estimation and underscore the importance of task-specific design and uncertainty modeling. Source code and models are available at https://github.com/Eucliwood-bsb/PyraFusionPose.

Original languageEnglish
Article number143
JournalVisual Computer
Volume42
Issue number2
DOIs
Publication statusPublished - Jan 2026
Externally publishedYes

Keywords

  • Human pose estimation
  • Hybrid transformer
  • Pyramid fusion
  • Uncertainty-aware regression

Fingerprint

Dive into the research topics of 'Enhancing human pose estimation accuracy with pyramid fusion Vision Transformers'. Together they form a unique fingerprint.

Cite this