TY - JOUR
T1 - TAMFN
T2 - Time-Aware Attention Multimodal Fusion Network for Depression Detection
AU - Zhou, Li
AU - Liu, Zhenyu
AU - Shangguan, Zixuan
AU - Yuan, Xiaoyan
AU - Li, Yutong
AU - Hu, Bin
N1 - Publisher Copyright:
© 2001-2011 IEEE.
PY - 2023
Y1 - 2023
N2 - In recent years, with the widespread popularity of the Internet, social media has become an indispensable part of people's lives. People regard online social media as an essential tool for interaction and communication. Due to the convenience of data acquisition from social media, mental health research on social media has received a lot of attention. The early detection of psychological disorder based on social media can help prevent further deterioration in at-risk people. In this paper, depression detection is performed based on non-verbal (acoustics and visual) behaviors of vlog. We propose a time-aware attention-based multimodal fusion depression detection network (TAMFN) to mine and fuse the multimodal features fully. The TAMFN model is constructed by a temporal convolutional network with the global information (GTCN), an intermodal feature extraction (IFE) module, and a time-aware attention multimodal fusion (TAMF) module. The GTCN model captures more temporal behavior information by combining local and global temporal information. The IFE module extracts the early interaction information between modalities to enrich the feature representation. The TAMF module guides the multimodal feature fusion by mining the temporal importance between different modalities. Our experiments are carried out on D-Vlog dataset, and the comparative experimental results report that our proposed TAMFN outperforms all benchmark models, indicating the effectiveness of the proposed TAMFN model.
AB - In recent years, with the widespread popularity of the Internet, social media has become an indispensable part of people's lives. People regard online social media as an essential tool for interaction and communication. Due to the convenience of data acquisition from social media, mental health research on social media has received a lot of attention. The early detection of psychological disorder based on social media can help prevent further deterioration in at-risk people. In this paper, depression detection is performed based on non-verbal (acoustics and visual) behaviors of vlog. We propose a time-aware attention-based multimodal fusion depression detection network (TAMFN) to mine and fuse the multimodal features fully. The TAMFN model is constructed by a temporal convolutional network with the global information (GTCN), an intermodal feature extraction (IFE) module, and a time-aware attention multimodal fusion (TAMF) module. The GTCN model captures more temporal behavior information by combining local and global temporal information. The IFE module extracts the early interaction information between modalities to enrich the feature representation. The TAMF module guides the multimodal feature fusion by mining the temporal importance between different modalities. Our experiments are carried out on D-Vlog dataset, and the comparative experimental results report that our proposed TAMFN outperforms all benchmark models, indicating the effectiveness of the proposed TAMFN model.
KW - Depression
KW - automatic detection
KW - non-verbal behaviors
KW - time-aware attention-based multimodal fusion depression detection network (TAMFN)
KW - vlog
UR - http://www.scopus.com/inward/record.url?scp=85144042686&partnerID=8YFLogxK
U2 - 10.1109/TNSRE.2022.3224135
DO - 10.1109/TNSRE.2022.3224135
M3 - Article
C2 - 36417750
AN - SCOPUS:85144042686
SN - 1534-4320
VL - 31
SP - 669
EP - 679
JO - IEEE Transactions on Neural Systems and Rehabilitation Engineering
JF - IEEE Transactions on Neural Systems and Rehabilitation Engineering
ER -