DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection

  • Yongfeng Tao
  • , Minqiang Yang
  • , Huiru Li
  • , Yushan Wu
  • , Bin Hu*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Depression is one of the most common mental illnesses, but few of the currently proposed in-depth models based on social media data take into account both temporal and spatial information in the data for the detection of depression. In this paper, we present an efficient, low-covariance multimodal integrated spatio-temporal converter framework called DepMSTAT, which aims to detect depression using acoustic and visual features in social media data. The framework consists of four modules: a data preprocessing module, a token generation module, a Spatial-Temporal Attentional Transformer (STAT) module, and a depression classifier module. To efficiently capture spatial and temporal correlations in multimodal social media depression data, a plug-and-play STAT module is proposed. The module is capable of extracting unimodal spatio-temporal features and fusing unimodal information, playing a key role in the analysis of acoustic and visual features in social media data. Through extensive experiments on a depression database (D-Vlog), the method in this paper shows high accuracy (71.53%) in depression detection, achieving a performance that exceeds most models. This work provides a scaffold for studies based on multimodal data that assists in the detection of depression.

Original languageEnglish
Pages (from-to)2956-2966
Number of pages11
JournalIEEE Transactions on Knowledge and Data Engineering
Volume36
Issue number7
DOIs
Publication statusPublished - 1 Jul 2024
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Depression detection
  • spatio-temporal attention
  • transformer
  • vlog data

Fingerprint

Dive into the research topics of 'DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection'. Together they form a unique fingerprint.

Cite this