An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism

Yutong Li, Juan Wang, Zhenyu Liu*, Li Zhou, Haibo Zhang, Cheng Tang, Xiping Hu, Bin Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Audio-visual based multimodal depression detection has gained significant attention due to its high efficiency and convenience as a computer-aided detection tool, resulting in promising performance. In this paper, we propose a cross-modal fusion network based on multi-head attention and residual structures (CMAFN) for depression recognition. CMAFN consists of three core modules: the Local Temporal Feature Extract Block (LTF), the Cross-Model Fusion Block (CFB), and the Multi-Head Temporal Attention Block (MTB). The LTF module performs feature extraction and encodes temporal information for audio and video modalities separately, while the CFB module facilitates complementary learning between the modalities. The MTB module accounts for the temporal influence of all modalities on each unimodal branch. With the incorporation of the three well-designed modules, CMAFN can refine the inter-modality complementarity and intra-modality temporal dependencies, achieving the interaction between unimodal branches and adaptive balance between modalities. Evaluation results on widely used depression datasets, AVEC2013 and AVEC2014, demonstrate that the proposed CMAFN method outperforms state-of-the-art approaches for depression recognition tasks. The results highlight the potential of CMAFN as an effective tool for the early detection and diagnosis of depression.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings
EditorsQingshan Liu, Hanzi Wang, Rongrong Ji, Zhanyu Ma, Weishi Zheng, Hongbin Zha, Xilin Chen, Liang Wang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages252-264
Number of pages13
ISBN (Print)9789819984688
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023 - Xiamen, China
Duration: 13 Oct 202315 Oct 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14429 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023
Country/TerritoryChina
CityXiamen
Period13/10/2315/10/23

Keywords

  • Automatic detection
  • Depression
  • Multi-modal fusion
  • Multimodal depression detection

Fingerprint

Dive into the research topics of 'An Automatic Depression Detection Method with Cross-Modal Fusion Network and Multi-head Attention Mechanism'. Together they form a unique fingerprint.

Cite this