TY - JOUR
T1 - UMD-Net
T2 - A Unified Multi-Task Assistive Driving Network Based on Multimodal Fusion
AU - Liu, Wenzhuo
AU - Qiao, Yicheng
AU - Li, Zhiwei
AU - Wang, Wenshuo
AU - Zhang, Wei
AU - Zhu, Jiayin
AU - Jiang, Yanhuan
AU - Wang, Li
AU - Wang, Hong
AU - Liu, Huaping
AU - Wang, Kunfeng
N1 - Publisher Copyright:
© 2000-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - In recent years, researchers have focused on identifying tasks related to driver state, traffic environment, and others to enhance the safety of autonomous driving assistance systems. However, current research on these tasks is conducted independently, neglecting the interconnections between the driver, traffic environment, and vehicle. In this paper, we propose a Unified Multi-task Assistive Driving Network Based on Multimodal Fusion (UMD-Net), the first unified model capable of recognizing four tasks simultaneously by utilizing multimodal data: driver behavior recognition, driver emotion recognition, traffic context recognition, and vehicle behavior recognition. In order to better enhance the synergistic effects between multiple tasks, we designed the position-sensitive multi-directional attention feature extraction subnetwork and recursive dynamic feature fusion module. The former captures the key features of multi-view images by different directions of attention mechanism to improve the generalization of the model across multiple tasks. The latter dynamically adjusts the fusion weight according to the multimodal features to enhance the representation ability of important features in multi-task learning. Our model was evaluated on the public dataset AIDE, achieving the best performance across all four tasks and a high accuracy of 95.31% in the traffic context recognition task, demonstrating the superiority of our approach.
AB - In recent years, researchers have focused on identifying tasks related to driver state, traffic environment, and others to enhance the safety of autonomous driving assistance systems. However, current research on these tasks is conducted independently, neglecting the interconnections between the driver, traffic environment, and vehicle. In this paper, we propose a Unified Multi-task Assistive Driving Network Based on Multimodal Fusion (UMD-Net), the first unified model capable of recognizing four tasks simultaneously by utilizing multimodal data: driver behavior recognition, driver emotion recognition, traffic context recognition, and vehicle behavior recognition. In order to better enhance the synergistic effects between multiple tasks, we designed the position-sensitive multi-directional attention feature extraction subnetwork and recursive dynamic feature fusion module. The former captures the key features of multi-view images by different directions of attention mechanism to improve the generalization of the model across multiple tasks. The latter dynamically adjusts the fusion weight according to the multimodal features to enhance the representation ability of important features in multi-task learning. Our model was evaluated on the public dataset AIDE, achieving the best performance across all four tasks and a high accuracy of 95.31% in the traffic context recognition task, demonstrating the superiority of our approach.
KW - ADAS
KW - driver state recognition
KW - Multi-task learning
KW - multimodal fusion
KW - traffic environment recognition
UR - http://www.scopus.com/inward/record.url?scp=105002827562&partnerID=8YFLogxK
U2 - 10.1109/TITS.2025.3556852
DO - 10.1109/TITS.2025.3556852
M3 - Article
AN - SCOPUS:105002827562
SN - 1524-9050
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
ER -