TY - JOUR
T1 - Enhancing UAV Human-Machine Interaction With Multimodal Behavioral Data
T2 - A Gaze-Posture Synergistic Approach
AU - Wang, Jintao
AU - Lu, Gen
AU - Gao, Yujie
AU - Hu, Bin
AU - Yang, Minqiang
N1 - Publisher Copyright:
© 1975-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - Recent advances in human-machine interaction (HMI) for unmanned aerial vehicles (UAVs) have highlighted the limitations of traditional single-modal interfaces, which often suffer from limited adaptability and vulnerability to sensor-level anomalies. These challenges become particularly critical when environmental noise or potential adversarial interference could compromise operational safety. To address these issues, this paper proposes the Gaze-Posture Integrated UAV (GPIUAV) framework, a multimodal interaction system that fuses eye-tracking inputs with head pose measurements. The framework incorporates three core modules: a head pose estimation algorithm based on Kalman filtering, a real-time gaze-to-scene coordinate mapping method, and a multimodal fusion control scheme. Experimental evaluation in real-world UAV tasks demonstrates that GPIUAV achieves an average control accuracy of 92.5%, a mean response time of 1.106 seconds, and consistent task completion performance. Compared to manual operation, this consistency indicates that the framework reduces reliance on operator proficiency and enhances control stability. These results validate the system’s effectiveness in enabling intuitive and precise control of UAVs. Furthermore, the fusion of complementary behavioral signals offers a pathway to future improvements in operational safety and resilience through cross-modal consistency checks. The GPIUAV framework thus contributes to more reliable human-UAV collaboration in domains such as medical delivery, urban monitoring, and emergency response.
AB - Recent advances in human-machine interaction (HMI) for unmanned aerial vehicles (UAVs) have highlighted the limitations of traditional single-modal interfaces, which often suffer from limited adaptability and vulnerability to sensor-level anomalies. These challenges become particularly critical when environmental noise or potential adversarial interference could compromise operational safety. To address these issues, this paper proposes the Gaze-Posture Integrated UAV (GPIUAV) framework, a multimodal interaction system that fuses eye-tracking inputs with head pose measurements. The framework incorporates three core modules: a head pose estimation algorithm based on Kalman filtering, a real-time gaze-to-scene coordinate mapping method, and a multimodal fusion control scheme. Experimental evaluation in real-world UAV tasks demonstrates that GPIUAV achieves an average control accuracy of 92.5%, a mean response time of 1.106 seconds, and consistent task completion performance. Compared to manual operation, this consistency indicates that the framework reduces reliance on operator proficiency and enhances control stability. These results validate the system’s effectiveness in enabling intuitive and precise control of UAVs. Furthermore, the fusion of complementary behavioral signals offers a pathway to future improvements in operational safety and resilience through cross-modal consistency checks. The GPIUAV framework thus contributes to more reliable human-UAV collaboration in domains such as medical delivery, urban monitoring, and emergency response.
KW - Unmanned aerial vehicle (UAV)
KW - eye-tracking
KW - head posture
KW - human-machine interaction (HMI)
KW - multimodal fusion
UR - https://www.scopus.com/pages/publications/105012369845
U2 - 10.1109/TCE.2025.3593889
DO - 10.1109/TCE.2025.3593889
M3 - Article
AN - SCOPUS:105012369845
SN - 0098-3063
VL - 71
SP - 8033
EP - 8044
JO - IEEE Transactions on Consumer Electronics
JF - IEEE Transactions on Consumer Electronics
IS - 3
ER -