A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images

Yutong Li, Zhenyu Liu, Gang Li, Qiongqiong Chen, Zhijie Ding, Xiping Hu*, Bin Hu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

The accuracy and availability are the most critical and challenging problems for major depressive disorder (MDD) diagnosis. Limited receptive field and inaccurate visual interpretation always weaken the clinical application of deep learning-based depression recognition model. Thus, we propose a visually interpretable depression monitoring model termed Transformer and Convolutional with slot-attention (TC-slot) to assess depression from facial images. Specifically, this approach stands upon the intersection of convolution and transformer, combines self-attention mechanism and deep convolution, and uses a well-designed stem structure to explore the global and local relationships. Moreover, in TC-slot, a classifier built on slot-attention mechanism directly involved in the decision-making process further localizes salient regions of facial depression patterns and provides precise and meaningful explanations. The results indicate that the proposed approach effectively improves the classification and recognition performance compared with other state-of-the-art approaches, with guaranteed favorable visual interpretability, providing clinical insights into the assessment of the assessing depression.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE Computer Society
Pages252-257
Number of pages6
ISBN (Electronic)9781665468916
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/TerritoryAustralia
CityBrisbane
Period10/07/2314/07/23

Keywords

  • Convolutional neural networks
  • Depression
  • Transformer
  • Visual interpretability

Fingerprint

Dive into the research topics of 'A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images'. Together they form a unique fingerprint.

Cite this