An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism

Yutong Li; Juan Wang; Zhenyu Liu; Li Zhou; Haibo Zhang; Cheng Tang; Xiping Hu; Bin Hu

doi:10.1007/978-981-99-8469-5_20

An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism

Yutong Li, Juan Wang, Zhenyu Liu^*, Li Zhou, Haibo Zhang, Cheng Tang, Xiping Hu, Bin Hu

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Audio-visual based multimodal depression detection has gained significant attention due to its high efficiency and convenience as a computer-aided detection tool, resulting in promising performance. In this paper, we propose a cross-modal fusion network based on multi-head attention and residual structures (CMAFN) for depression recognition. CMAFN consists of three core modules: the Local Temporal Feature Extract Block (LTF), the Cross-Model Fusion Block (CFB), and the Multi-Head Temporal Attention Block (MTB). The LTF module performs feature extraction and encodes temporal information for audio and video modalities separately, while the CFB module facilitates complementary learning between the modalities. The MTB module accounts for the temporal influence of all modalities on each unimodal branch. With the incorporation of the three well-designed modules, CMAFN can refine the inter-modality complementarity and intra-modality temporal dependencies, achieving the interaction between unimodal branches and adaptive balance between modalities. Evaluation results on widely used depression datasets, AVEC2013 and AVEC2014, demonstrate that the proposed CMAFN method outperforms state-of-the-art approaches for depression recognition tasks. The results highlight the potential of CMAFN as an effective tool for the early detection and diagnosis of depression.

源语言	英语
主期刊名	Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings
编辑	Qingshan Liu, Hanzi Wang, Rongrong Ji, Zhanyu Ma, Weishi Zheng, Hongbin Zha, Xilin Chen, Liang Wang
出版商	Springer Science and Business Media Deutschland GmbH
页	252-264
页数	13
ISBN（印刷版）	9789819984688
DOI	https://doi.org/10.1007/978-981-99-8469-5_20
出版状态	已出版 - 2024
已对外发布	是
活动	6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 - Xiamen, 中国期限: 13 10月 2023 → 15 10月 2023

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	14429 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023
国家/地区	中国
市	Xiamen
时期	13/10/23 → 15/10/23

访问文件

10.1007/978-981-99-8469-5_20

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, Y., Wang, J., Liu, Z., Zhou, L., Zhang, H., Tang, C., Hu, X., & Hu, B. (2024). An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism. 在 Q. Liu, H. Wang, R. Ji, Z. Ma, W. Zheng, H. Zha, X. Chen, & L. Wang (编辑), Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings (页码 252-264). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14429 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-8469-5_20

Li, Yutong ; Wang, Juan ; Liu, Zhenyu 等. / An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism. Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. 编辑 / Qingshan Liu ; Hanzi Wang ; Rongrong Ji ; Zhanyu Ma ; Weishi Zheng ; Hongbin Zha ; Xilin Chen ; Liang Wang. Springer Science and Business Media Deutschland GmbH, 2024. 页码 252-264 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{e4023df4bcc1490ca7fac466e28b5dc7,

title = "An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism",

abstract = "Audio-visual based multimodal depression detection has gained significant attention due to its high efficiency and convenience as a computer-aided detection tool, resulting in promising performance. In this paper, we propose a cross-modal fusion network based on multi-head attention and residual structures (CMAFN) for depression recognition. CMAFN consists of three core modules: the Local Temporal Feature Extract Block (LTF), the Cross-Model Fusion Block (CFB), and the Multi-Head Temporal Attention Block (MTB). The LTF module performs feature extraction and encodes temporal information for audio and video modalities separately, while the CFB module facilitates complementary learning between the modalities. The MTB module accounts for the temporal influence of all modalities on each unimodal branch. With the incorporation of the three well-designed modules, CMAFN can refine the inter-modality complementarity and intra-modality temporal dependencies, achieving the interaction between unimodal branches and adaptive balance between modalities. Evaluation results on widely used depression datasets, AVEC2013 and AVEC2014, demonstrate that the proposed CMAFN method outperforms state-of-the-art approaches for depression recognition tasks. The results highlight the potential of CMAFN as an effective tool for the early detection and diagnosis of depression.",

keywords = "Automatic detection, Depression, Multi-modal fusion, Multimodal depression detection",

author = "Yutong Li and Juan Wang and Zhenyu Liu and Li Zhou and Haibo Zhang and Cheng Tang and Xiping Hu and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.; 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 ; Conference date: 13-10-2023 Through 15-10-2023",

year = "2024",

doi = "10.1007/978-981-99-8469-5_20",

language = "English",

isbn = "9789819984688",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "252--264",

editor = "Qingshan Liu and Hanzi Wang and Rongrong Ji and Zhanyu Ma and Weishi Zheng and Hongbin Zha and Xilin Chen and Liang Wang",

booktitle = "Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings",

address = "Germany",

}

Li, Y, Wang, J, Liu, Z, Zhou, L, Zhang, H, Tang, C, Hu, X & Hu, B 2024, An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism. 在 Q Liu, H Wang, R Ji, Z Ma, W Zheng, H Zha, X Chen & L Wang (编辑), Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 14429 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 252-264, 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023, Xiamen, 中国, 13/10/23. https://doi.org/10.1007/978-981-99-8469-5_20

An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism. / Li, Yutong; Wang, Juan; Liu, Zhenyu 等.
Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. 编辑 / Qingshan Liu; Hanzi Wang; Rongrong Ji; Zhanyu Ma; Weishi Zheng; Hongbin Zha; Xilin Chen; Liang Wang. Springer Science and Business Media Deutschland GmbH, 2024. 页码 252-264 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14429 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism

AU - Li, Yutong

AU - Wang, Juan

AU - Liu, Zhenyu

AU - Zhou, Li

AU - Zhang, Haibo

AU - Tang, Cheng

AU - Hu, Xiping

AU - Hu, Bin

PY - 2024

Y1 - 2024

N2 - Audio-visual based multimodal depression detection has gained significant attention due to its high efficiency and convenience as a computer-aided detection tool, resulting in promising performance. In this paper, we propose a cross-modal fusion network based on multi-head attention and residual structures (CMAFN) for depression recognition. CMAFN consists of three core modules: the Local Temporal Feature Extract Block (LTF), the Cross-Model Fusion Block (CFB), and the Multi-Head Temporal Attention Block (MTB). The LTF module performs feature extraction and encodes temporal information for audio and video modalities separately, while the CFB module facilitates complementary learning between the modalities. The MTB module accounts for the temporal influence of all modalities on each unimodal branch. With the incorporation of the three well-designed modules, CMAFN can refine the inter-modality complementarity and intra-modality temporal dependencies, achieving the interaction between unimodal branches and adaptive balance between modalities. Evaluation results on widely used depression datasets, AVEC2013 and AVEC2014, demonstrate that the proposed CMAFN method outperforms state-of-the-art approaches for depression recognition tasks. The results highlight the potential of CMAFN as an effective tool for the early detection and diagnosis of depression.

AB - Audio-visual based multimodal depression detection has gained significant attention due to its high efficiency and convenience as a computer-aided detection tool, resulting in promising performance. In this paper, we propose a cross-modal fusion network based on multi-head attention and residual structures (CMAFN) for depression recognition. CMAFN consists of three core modules: the Local Temporal Feature Extract Block (LTF), the Cross-Model Fusion Block (CFB), and the Multi-Head Temporal Attention Block (MTB). The LTF module performs feature extraction and encodes temporal information for audio and video modalities separately, while the CFB module facilitates complementary learning between the modalities. The MTB module accounts for the temporal influence of all modalities on each unimodal branch. With the incorporation of the three well-designed modules, CMAFN can refine the inter-modality complementarity and intra-modality temporal dependencies, achieving the interaction between unimodal branches and adaptive balance between modalities. Evaluation results on widely used depression datasets, AVEC2013 and AVEC2014, demonstrate that the proposed CMAFN method outperforms state-of-the-art approaches for depression recognition tasks. The results highlight the potential of CMAFN as an effective tool for the early detection and diagnosis of depression.

KW - Automatic detection

KW - Depression

KW - Multi-modal fusion

KW - Multimodal depression detection

UR - http://www.scopus.com/inward/record.url?scp=85180752139&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-8469-5_20

DO - 10.1007/978-981-99-8469-5_20

M3 - Conference contribution

AN - SCOPUS:85180752139

SN - 9789819984688

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 252

EP - 264

BT - Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings

A2 - Liu, Qingshan

A2 - Wang, Hanzi

A2 - Ji, Rongrong

A2 - Ma, Zhanyu

A2 - Zheng, Weishi

A2 - Zha, Hongbin

A2 - Chen, Xilin

A2 - Wang, Liang

PB - Springer Science and Business Media Deutschland GmbH

T2 - 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023

Y2 - 13 October 2023 through 15 October 2023

ER -

Li Y, Wang J, Liu Z, Zhou L, Zhang H, Tang C 等. An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism. 在 Liu Q, Wang H, Ji R, Ma Z, Zheng W, Zha H, Chen X, Wang L, 编辑, Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. 页码 252-264. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-99-8469-5_20

An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此