DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection

Yongfeng Tao; Minqiang Yang; Huiru Li; Yushan Wu; Bin Hu

doi:10.1109/TKDE.2024.3350071

DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection

Yongfeng Tao, Minqiang Yang, Huiru Li, Yushan Wu, Bin Hu

Lanzhou University

Research output: Contribution to journal › Article › peer-review

11 Citations (Scopus)

Abstract

Depression is one of the most common mental illnesses, but few of the currently proposed in-depth models based on social media data take into account both temporal and spatial information in the data for the detection of depression. In this paper, we present an efficient, low-covariance multimodal integrated spatio-temporal converter framework called DepMSTAT, which aims to detect depression using acoustic and visual features in social media data. The framework consists of four modules: a data preprocessing module, a token generation module, a Spatial-Temporal Attentional Transformer (STAT) module, and a depression classifier module. To efficiently capture spatial and temporal correlations in multimodal social media depression data, a plug-and-play STAT module is proposed. The module is capable of extracting unimodal spatio-temporal features and fusing unimodal information, playing a key role in the analysis of acoustic and visual features in social media data. Through extensive experiments on a depression database (D-Vlog), the method in this paper shows high accuracy (71.53%) in depression detection, achieving a performance that exceeds most models. This work provides a scaffold for studies based on multimodal data that assists in the detection of depression.

Original language	English
Pages (from-to)	1-12
Number of pages	12
Journal	IEEE Transactions on Knowledge and Data Engineering
DOIs	https://doi.org/10.1109/TKDE.2024.3350071
Publication status	Accepted/In press - 2024
Externally published	Yes

Keywords

Data mining
Depression
Depression detection
Feature extraction
Semantics
Social networking (online)
Spatio-temporal attention
Transformer
Transformers
Visualization
Vlog data

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/TKDE.2024.3350071

Cite this

@article{cf6d54ddad9948cf9ae37686c7c2841e,

title = "DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection",

abstract = "Depression is one of the most common mental illnesses, but few of the currently proposed in-depth models based on social media data take into account both temporal and spatial information in the data for the detection of depression. In this paper, we present an efficient, low-covariance multimodal integrated spatio-temporal converter framework called DepMSTAT, which aims to detect depression using acoustic and visual features in social media data. The framework consists of four modules: a data preprocessing module, a token generation module, a Spatial-Temporal Attentional Transformer (STAT) module, and a depression classifier module. To efficiently capture spatial and temporal correlations in multimodal social media depression data, a plug-and-play STAT module is proposed. The module is capable of extracting unimodal spatio-temporal features and fusing unimodal information, playing a key role in the analysis of acoustic and visual features in social media data. Through extensive experiments on a depression database (D-Vlog), the method in this paper shows high accuracy (71.53%) in depression detection, achieving a performance that exceeds most models. This work provides a scaffold for studies based on multimodal data that assists in the detection of depression.",

keywords = "Data mining, Depression, Depression detection, Feature extraction, Semantics, Social networking (online), Spatio-temporal attention, Transformer, Transformers, Visualization, Vlog data",

author = "Yongfeng Tao and Minqiang Yang and Huiru Li and Yushan Wu and Bin Hu",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TKDE.2024.3350071",

language = "English",

pages = "1--12",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - DepMSTAT

T2 - Multimodal Spatio-Temporal Attentional Transformer for Depression Detection

AU - Tao, Yongfeng

AU - Yang, Minqiang

AU - Li, Huiru

AU - Wu, Yushan

AU - Hu, Bin

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Depression is one of the most common mental illnesses, but few of the currently proposed in-depth models based on social media data take into account both temporal and spatial information in the data for the detection of depression. In this paper, we present an efficient, low-covariance multimodal integrated spatio-temporal converter framework called DepMSTAT, which aims to detect depression using acoustic and visual features in social media data. The framework consists of four modules: a data preprocessing module, a token generation module, a Spatial-Temporal Attentional Transformer (STAT) module, and a depression classifier module. To efficiently capture spatial and temporal correlations in multimodal social media depression data, a plug-and-play STAT module is proposed. The module is capable of extracting unimodal spatio-temporal features and fusing unimodal information, playing a key role in the analysis of acoustic and visual features in social media data. Through extensive experiments on a depression database (D-Vlog), the method in this paper shows high accuracy (71.53%) in depression detection, achieving a performance that exceeds most models. This work provides a scaffold for studies based on multimodal data that assists in the detection of depression.

AB - Depression is one of the most common mental illnesses, but few of the currently proposed in-depth models based on social media data take into account both temporal and spatial information in the data for the detection of depression. In this paper, we present an efficient, low-covariance multimodal integrated spatio-temporal converter framework called DepMSTAT, which aims to detect depression using acoustic and visual features in social media data. The framework consists of four modules: a data preprocessing module, a token generation module, a Spatial-Temporal Attentional Transformer (STAT) module, and a depression classifier module. To efficiently capture spatial and temporal correlations in multimodal social media depression data, a plug-and-play STAT module is proposed. The module is capable of extracting unimodal spatio-temporal features and fusing unimodal information, playing a key role in the analysis of acoustic and visual features in social media data. Through extensive experiments on a depression database (D-Vlog), the method in this paper shows high accuracy (71.53%) in depression detection, achieving a performance that exceeds most models. This work provides a scaffold for studies based on multimodal data that assists in the detection of depression.

KW - Data mining

KW - Depression

KW - Depression detection

KW - Feature extraction

KW - Semantics

KW - Social networking (online)

KW - Spatio-temporal attention

KW - Transformer

KW - Transformers

KW - Visualization

KW - Vlog data

UR - http://www.scopus.com/inward/record.url?scp=85182387243&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2024.3350071

DO - 10.1109/TKDE.2024.3350071

M3 - Article

AN - SCOPUS:85182387243

SN - 1041-4347

SP - 1

EP - 12

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

ER -

DepMSTAT: Multimodal Spatio-Temporal Attentional Transformer for Depression Detection

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this