CAIINET: Neural network based on contextual attention and information interaction mechanism for depression detection

Li Zhou; Zhenyu Liu; Xiaoyan Yuan; Zixuan Shangguan; Yutong Li; Bin Hu

doi:10.1016/j.dsp.2023.103986

CAIINET: Neural network based on contextual attention and information interaction mechanism for depression detection

Li Zhou, Zhenyu Liu^*, Xiaoyan Yuan, Zixuan Shangguan, Yutong Li, Bin Hu

^*此作品的通讯作者

Lanzhou University

科研成果: 期刊稿件 › 文章 › 同行评审

10 引用（Scopus）

摘要

Depression is a globally widespread psychological disorder that has a serious impact on the physical and mental health of patients. Currently, depression detection methods based on physiological signals are widely used, but the limitation is that physiological signals are not easy to collect. With the rapid development of social media, vlogs posted by users not only reflect the current emotional state, but also provide the possibility of early depression detection, and the data are more easily obtained. Therefore, early depression detection based on social media has become a hot research topic. However, due to the large and diverse social data that users may publish, how to effectively extract critical temporal information and fuse multiple modal data becomes an urgent problem to be solved. To realize the early detection of depression on vlog data, we propose a neural network based on contextual attention and information interaction mechanism (CAIINET). CAIINET is composed of three core modules: BiLSTM based on contextual attention module (CAM-BilSTM), local information fusion module (LIFM), and global information interaction module (GIIM). The CAM-BilSTM model captures important acoustic and visual features at critical time points. The LIFM and GIIM modules extract the relevance and interactivity between extracted acoustic and visual features at local and global scales. Experiments are conducted on the D-Vlog dataset, and the CAIINET model achieves 66.56%, 66.98% and 66.55% for weighted average precision, recall and F1 score, respectively, outperforming the ten benchmark models. The experimental results show that the CAIINET model has good depression detection capability, and furthermore, the effectiveness of the three submodules of the CAIINET model is investigated by the ablation experiment.

源语言	英语
文章编号	103986
期刊	Digital Signal Processing: A Review Journal
卷	137
DOI	https://doi.org/10.1016/j.dsp.2023.103986
出版状态	已出版 - 15 6月 2023
已对外发布	是

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1016/j.dsp.2023.103986

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{eac66b1620f8442d8e61dd104fae6c88,

title = "CAIINET: Neural network based on contextual attention and information interaction mechanism for depression detection",

abstract = "Depression is a globally widespread psychological disorder that has a serious impact on the physical and mental health of patients. Currently, depression detection methods based on physiological signals are widely used, but the limitation is that physiological signals are not easy to collect. With the rapid development of social media, vlogs posted by users not only reflect the current emotional state, but also provide the possibility of early depression detection, and the data are more easily obtained. Therefore, early depression detection based on social media has become a hot research topic. However, due to the large and diverse social data that users may publish, how to effectively extract critical temporal information and fuse multiple modal data becomes an urgent problem to be solved. To realize the early detection of depression on vlog data, we propose a neural network based on contextual attention and information interaction mechanism (CAIINET). CAIINET is composed of three core modules: BiLSTM based on contextual attention module (CAM-BilSTM), local information fusion module (LIFM), and global information interaction module (GIIM). The CAM-BilSTM model captures important acoustic and visual features at critical time points. The LIFM and GIIM modules extract the relevance and interactivity between extracted acoustic and visual features at local and global scales. Experiments are conducted on the D-Vlog dataset, and the CAIINET model achieves 66.56%, 66.98% and 66.55% for weighted average precision, recall and F1 score, respectively, outperforming the ten benchmark models. The experimental results show that the CAIINET model has good depression detection capability, and furthermore, the effectiveness of the three submodules of the CAIINET model is investigated by the ablation experiment.",

keywords = "BiLSTM based on contextual attention (CAM-BilSTM), Depression detection, Global information interaction module (GIIM), Local information fusion module (LIFM), Vlog",

author = "Li Zhou and Zhenyu Liu and Xiaoyan Yuan and Zixuan Shangguan and Yutong Li and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2023",

month = jun,

day = "15",

doi = "10.1016/j.dsp.2023.103986",

language = "English",

volume = "137",

journal = "Digital Signal Processing: A Review Journal",

issn = "1051-2004",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - CAIINET

T2 - Neural network based on contextual attention and information interaction mechanism for depression detection

AU - Zhou, Li

AU - Liu, Zhenyu

AU - Yuan, Xiaoyan

AU - Shangguan, Zixuan

AU - Li, Yutong

AU - Hu, Bin

PY - 2023/6/15

Y1 - 2023/6/15

N2 - Depression is a globally widespread psychological disorder that has a serious impact on the physical and mental health of patients. Currently, depression detection methods based on physiological signals are widely used, but the limitation is that physiological signals are not easy to collect. With the rapid development of social media, vlogs posted by users not only reflect the current emotional state, but also provide the possibility of early depression detection, and the data are more easily obtained. Therefore, early depression detection based on social media has become a hot research topic. However, due to the large and diverse social data that users may publish, how to effectively extract critical temporal information and fuse multiple modal data becomes an urgent problem to be solved. To realize the early detection of depression on vlog data, we propose a neural network based on contextual attention and information interaction mechanism (CAIINET). CAIINET is composed of three core modules: BiLSTM based on contextual attention module (CAM-BilSTM), local information fusion module (LIFM), and global information interaction module (GIIM). The CAM-BilSTM model captures important acoustic and visual features at critical time points. The LIFM and GIIM modules extract the relevance and interactivity between extracted acoustic and visual features at local and global scales. Experiments are conducted on the D-Vlog dataset, and the CAIINET model achieves 66.56%, 66.98% and 66.55% for weighted average precision, recall and F1 score, respectively, outperforming the ten benchmark models. The experimental results show that the CAIINET model has good depression detection capability, and furthermore, the effectiveness of the three submodules of the CAIINET model is investigated by the ablation experiment.

AB - Depression is a globally widespread psychological disorder that has a serious impact on the physical and mental health of patients. Currently, depression detection methods based on physiological signals are widely used, but the limitation is that physiological signals are not easy to collect. With the rapid development of social media, vlogs posted by users not only reflect the current emotional state, but also provide the possibility of early depression detection, and the data are more easily obtained. Therefore, early depression detection based on social media has become a hot research topic. However, due to the large and diverse social data that users may publish, how to effectively extract critical temporal information and fuse multiple modal data becomes an urgent problem to be solved. To realize the early detection of depression on vlog data, we propose a neural network based on contextual attention and information interaction mechanism (CAIINET). CAIINET is composed of three core modules: BiLSTM based on contextual attention module (CAM-BilSTM), local information fusion module (LIFM), and global information interaction module (GIIM). The CAM-BilSTM model captures important acoustic and visual features at critical time points. The LIFM and GIIM modules extract the relevance and interactivity between extracted acoustic and visual features at local and global scales. Experiments are conducted on the D-Vlog dataset, and the CAIINET model achieves 66.56%, 66.98% and 66.55% for weighted average precision, recall and F1 score, respectively, outperforming the ten benchmark models. The experimental results show that the CAIINET model has good depression detection capability, and furthermore, the effectiveness of the three submodules of the CAIINET model is investigated by the ablation experiment.

KW - BiLSTM based on contextual attention (CAM-BilSTM)

KW - Depression detection

KW - Global information interaction module (GIIM)

KW - Local information fusion module (LIFM)

KW - Vlog

UR - http://www.scopus.com/inward/record.url?scp=85151564362&partnerID=8YFLogxK

U2 - 10.1016/j.dsp.2023.103986

DO - 10.1016/j.dsp.2023.103986

M3 - Article

AN - SCOPUS:85151564362

SN - 1051-2004

VL - 137

JO - Digital Signal Processing: A Review Journal

JF - Digital Signal Processing: A Review Journal

M1 - 103986

ER -

CAIINET: Neural network based on contextual attention and information interaction mechanism for depression detection

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此