Skip to main navigation Skip to search Skip to main content

MoViSense: Multiview Spatiotemporal Transformer for 3-D Human Kinematics Sensing

  • Dezhi Zheng
  • , Zhiyi Yang
  • , Wenfeng Liu
  • , Xiao Liang
  • , Chun Hu*
  • , Kang Ma*
  • *Corresponding author for this work
  • Beijing Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Human pose estimation has wide applications in health monitoring, disease diagnosis, and motion rehabilitation. These applications rely on motion assessment using kinematic parameters, which can be obtained from the coordinates of human body key points. Most current noncontact key point measurement methods rely on multiple cameras to reconstruct the human body in realistic multiperson interaction and occlusion scenarios. However, sparse camera configurations often lead to limited pose estimation accuracy, while increasing the number of viewpoints to support high-accuracy kinematic analysis reduces efficiency. In this study, MoViSense is proposed for 3-D human pose estimation and kinematic analysis under sparse camera configurations by exploiting the spatial and temporal continuity. Based on the Transformer, the encoder integrates a multiscale gated feedforward (MSFF) module to enhance spatial representations and cross-view alignment, while a dynamic history fusion-deformable multiscale attention (DHF-DMA) module utilizes temporal continuity of human motion to improve robustness under occlusion. In addition, a biomechanical constraint mechanism (BCM) enforces bone-length consistency. A motion kinetics extractor (MKE) converts estimated 3-D key points into interpretable kinematic parameters. Experiments on the CMU Panoptic dataset show that MoViSense achieves an AP50 of 86.49 and an MPJPE of 27.85 mm, outperforming the other representative methods under sparse camera configurations. The relative deviation (RD) of stride length was 5.37%, and the RD of cadence was 1.45%.

Original languageEnglish
Pages (from-to)16027-16036
Number of pages10
JournalIEEE Sensors Journal
Volume26
Issue number10
DOIs
Publication statusPublished - 1 May 2026

Keywords

  • Kinematic analysis
  • Transformer
  • motion assessment
  • multiview 3-D human pose estimation

Fingerprint

Dive into the research topics of 'MoViSense: Multiview Spatiotemporal Transformer for 3-D Human Kinematics Sensing'. Together they form a unique fingerprint.

Cite this