Abstract
Human pose estimation, a pivotal task in computer vision, finds applications in diverse fields such as action recognition, human–computer interaction, and healthcare. Despite recent advances leveraging powerful backbone networks, challenges remain in fully exploiting structural priors and spatial dependencies. This paper introduces PyraFusionPose, a novel pose estimation framework featuring a task-specific backbone architecture and an uncertainty-aware regression head. Our approach integrates both fine-grained local details and global contextual information, enhancing accuracy and robustness. Extensive experiments on the COCO benchmark demonstrate a 0.6% improvement in AP over the state-of-the-art model, ViTPose, with superior performance in complex scenarios such as crowded scenes and occlusions. These findings establish a new standard for human pose estimation and underscore the importance of task-specific design and uncertainty modeling. Source code and models are available at https://github.com/Eucliwood-bsb/PyraFusionPose.
| Original language | English |
|---|---|
| Article number | 143 |
| Journal | Visual Computer |
| Volume | 42 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - Jan 2026 |
| Externally published | Yes |
Keywords
- Human pose estimation
- Hybrid transformer
- Pyramid fusion
- Uncertainty-aware regression