TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection

Li Zhou; Zhenyu Liu; Zixuan Shangguan; Xiaoyan Yuan; Yutong Li; Bin Hu

doi:10.1109/TNSRE.2022.3224135

TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection

Li Zhou, Zhenyu Liu^*, Zixuan Shangguan, Xiaoyan Yuan, Yutong Li, Bin Hu^*

^*Corresponding author for this work

Lanzhou University

Research output: Contribution to journal › Article › peer-review

14 Citations (Scopus)

Abstract

In recent years, with the widespread popularity of the Internet, social media has become an indispensable part of people's lives. People regard online social media as an essential tool for interaction and communication. Due to the convenience of data acquisition from social media, mental health research on social media has received a lot of attention. The early detection of psychological disorder based on social media can help prevent further deterioration in at-risk people. In this paper, depression detection is performed based on non-verbal (acoustics and visual) behaviors of vlog. We propose a time-aware attention-based multimodal fusion depression detection network (TAMFN) to mine and fuse the multimodal features fully. The TAMFN model is constructed by a temporal convolutional network with the global information (GTCN), an intermodal feature extraction (IFE) module, and a time-aware attention multimodal fusion (TAMF) module. The GTCN model captures more temporal behavior information by combining local and global temporal information. The IFE module extracts the early interaction information between modalities to enrich the feature representation. The TAMF module guides the multimodal feature fusion by mining the temporal importance between different modalities. Our experiments are carried out on D-Vlog dataset, and the comparative experimental results report that our proposed TAMFN outperforms all benchmark models, indicating the effectiveness of the proposed TAMFN model.

Original language	English
Pages (from-to)	669-679
Number of pages	11
Journal	IEEE Transactions on Neural Systems and Rehabilitation Engineering
Volume	31
DOIs	https://doi.org/10.1109/TNSRE.2022.3224135
Publication status	Published - 2023
Externally published	Yes

Keywords

Depression
automatic detection
non-verbal behaviors
time-aware attention-based multimodal fusion depression detection network (TAMFN)
vlog

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/TNSRE.2022.3224135

Cite this

Zhou, L., Liu, Z., Shangguan, Z., Yuan, X., Li, Y., & Hu, B. (2023). TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 31, 669-679. https://doi.org/10.1109/TNSRE.2022.3224135

@article{354e74685e51426fb8092bfec8160f3f,

title = "TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection",

abstract = "In recent years, with the widespread popularity of the Internet, social media has become an indispensable part of people's lives. People regard online social media as an essential tool for interaction and communication. Due to the convenience of data acquisition from social media, mental health research on social media has received a lot of attention. The early detection of psychological disorder based on social media can help prevent further deterioration in at-risk people. In this paper, depression detection is performed based on non-verbal (acoustics and visual) behaviors of vlog. We propose a time-aware attention-based multimodal fusion depression detection network (TAMFN) to mine and fuse the multimodal features fully. The TAMFN model is constructed by a temporal convolutional network with the global information (GTCN), an intermodal feature extraction (IFE) module, and a time-aware attention multimodal fusion (TAMF) module. The GTCN model captures more temporal behavior information by combining local and global temporal information. The IFE module extracts the early interaction information between modalities to enrich the feature representation. The TAMF module guides the multimodal feature fusion by mining the temporal importance between different modalities. Our experiments are carried out on D-Vlog dataset, and the comparative experimental results report that our proposed TAMFN outperforms all benchmark models, indicating the effectiveness of the proposed TAMFN model.",

keywords = "Depression, automatic detection, non-verbal behaviors, time-aware attention-based multimodal fusion depression detection network (TAMFN), vlog",

author = "Li Zhou and Zhenyu Liu and Zixuan Shangguan and Xiaoyan Yuan and Yutong Li and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2001-2011 IEEE.",

year = "2023",

doi = "10.1109/TNSRE.2022.3224135",

language = "English",

volume = "31",

pages = "669--679",

journal = "IEEE Transactions on Neural Systems and Rehabilitation Engineering",

issn = "1534-4320",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - TAMFN

T2 - Time-Aware Attention Multimodal Fusion Network for Depression Detection

AU - Zhou, Li

AU - Liu, Zhenyu

AU - Shangguan, Zixuan

AU - Yuan, Xiaoyan

AU - Li, Yutong

AU - Hu, Bin

PY - 2023

Y1 - 2023

N2 - In recent years, with the widespread popularity of the Internet, social media has become an indispensable part of people's lives. People regard online social media as an essential tool for interaction and communication. Due to the convenience of data acquisition from social media, mental health research on social media has received a lot of attention. The early detection of psychological disorder based on social media can help prevent further deterioration in at-risk people. In this paper, depression detection is performed based on non-verbal (acoustics and visual) behaviors of vlog. We propose a time-aware attention-based multimodal fusion depression detection network (TAMFN) to mine and fuse the multimodal features fully. The TAMFN model is constructed by a temporal convolutional network with the global information (GTCN), an intermodal feature extraction (IFE) module, and a time-aware attention multimodal fusion (TAMF) module. The GTCN model captures more temporal behavior information by combining local and global temporal information. The IFE module extracts the early interaction information between modalities to enrich the feature representation. The TAMF module guides the multimodal feature fusion by mining the temporal importance between different modalities. Our experiments are carried out on D-Vlog dataset, and the comparative experimental results report that our proposed TAMFN outperforms all benchmark models, indicating the effectiveness of the proposed TAMFN model.

AB - In recent years, with the widespread popularity of the Internet, social media has become an indispensable part of people's lives. People regard online social media as an essential tool for interaction and communication. Due to the convenience of data acquisition from social media, mental health research on social media has received a lot of attention. The early detection of psychological disorder based on social media can help prevent further deterioration in at-risk people. In this paper, depression detection is performed based on non-verbal (acoustics and visual) behaviors of vlog. We propose a time-aware attention-based multimodal fusion depression detection network (TAMFN) to mine and fuse the multimodal features fully. The TAMFN model is constructed by a temporal convolutional network with the global information (GTCN), an intermodal feature extraction (IFE) module, and a time-aware attention multimodal fusion (TAMF) module. The GTCN model captures more temporal behavior information by combining local and global temporal information. The IFE module extracts the early interaction information between modalities to enrich the feature representation. The TAMF module guides the multimodal feature fusion by mining the temporal importance between different modalities. Our experiments are carried out on D-Vlog dataset, and the comparative experimental results report that our proposed TAMFN outperforms all benchmark models, indicating the effectiveness of the proposed TAMFN model.

KW - Depression

KW - automatic detection

KW - non-verbal behaviors

KW - time-aware attention-based multimodal fusion depression detection network (TAMFN)

KW - vlog

UR - http://www.scopus.com/inward/record.url?scp=85144042686&partnerID=8YFLogxK

U2 - 10.1109/TNSRE.2022.3224135

DO - 10.1109/TNSRE.2022.3224135

M3 - Article

C2 - 36417750

AN - SCOPUS:85144042686

SN - 1534-4320

VL - 31

SP - 669

EP - 679

JO - IEEE Transactions on Neural Systems and Rehabilitation Engineering

JF - IEEE Transactions on Neural Systems and Rehabilitation Engineering

ER -

TAMFN: Time-Aware Attention Multimodal Fusion Network for Depression Detection

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this