A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images

Yutong Li; Zhenyu Liu; Gang Li; Qiongqiong Chen; Zhijie Ding; Xiping Hu; Bin Hu

doi:10.1109/ICME55011.2023.00051

A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images

Yutong Li, Zhenyu Liu, Gang Li, Qiongqiong Chen, Zhijie Ding, Xiping Hu^*, Bin Hu^*

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Citations (Scopus)

Abstract

The accuracy and availability are the most critical and challenging problems for major depressive disorder (MDD) diagnosis. Limited receptive field and inaccurate visual interpretation always weaken the clinical application of deep learning-based depression recognition model. Thus, we propose a visually interpretable depression monitoring model termed Transformer and Convolutional with slot-attention (TC-slot) to assess depression from facial images. Specifically, this approach stands upon the intersection of convolution and transformer, combines self-attention mechanism and deep convolution, and uses a well-designed stem structure to explore the global and local relationships. Moreover, in TC-slot, a classifier built on slot-attention mechanism directly involved in the decision-making process further localizes salient regions of facial depression patterns and provides precise and meaningful explanations. The results indicate that the proposed approach effectively improves the classification and recognition performance compared with other state-of-the-art approaches, with guaranteed favorable visual interpretability, providing clinical insights into the assessment of the assessing depression.

Original language	English
Title of host publication	Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Publisher	IEEE Computer Society
Pages	252-257
Number of pages	6
ISBN (Electronic)	9781665468916
DOIs	https://doi.org/10.1109/ICME55011.2023.00051
Publication status	Published - 2023
Externally published	Yes
Event	2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia Duration: 10 Jul 2023 → 14 Jul 2023

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
Volume	2023-July
ISSN (Print)	1945-7871
ISSN (Electronic)	1945-788X

Conference

Conference	2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/Territory	Australia
City	Brisbane
Period	10/07/23 → 14/07/23

Keywords

Convolutional neural networks
Depression
Transformer
Visual interpretability

Access to Document

10.1109/ICME55011.2023.00051

Cite this

Li, Y., Liu, Z., Li, G., Chen, Q., Ding, Z., Hu, X., & Hu, B. (2023). A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images. In Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023 (pp. 252-257). (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2023-July). IEEE Computer Society. https://doi.org/10.1109/ICME55011.2023.00051

@inproceedings{c3832e50c2714364b787fd5c9b546d34,

title = "A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images",

abstract = "The accuracy and availability are the most critical and challenging problems for major depressive disorder (MDD) diagnosis. Limited receptive field and inaccurate visual interpretation always weaken the clinical application of deep learning-based depression recognition model. Thus, we propose a visually interpretable depression monitoring model termed Transformer and Convolutional with slot-attention (TC-slot) to assess depression from facial images. Specifically, this approach stands upon the intersection of convolution and transformer, combines self-attention mechanism and deep convolution, and uses a well-designed stem structure to explore the global and local relationships. Moreover, in TC-slot, a classifier built on slot-attention mechanism directly involved in the decision-making process further localizes salient regions of facial depression patterns and provides precise and meaningful explanations. The results indicate that the proposed approach effectively improves the classification and recognition performance compared with other state-of-the-art approaches, with guaranteed favorable visual interpretability, providing clinical insights into the assessment of the assessing depression.",

keywords = "Convolutional neural networks, Depression, Transformer, Visual interpretability",

author = "Yutong Li and Zhenyu Liu and Gang Li and Qiongqiong Chen and Zhijie Ding and Xiping Hu and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE International Conference on Multimedia and Expo, ICME 2023 ; Conference date: 10-07-2023 Through 14-07-2023",

year = "2023",

doi = "10.1109/ICME55011.2023.00051",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

pages = "252--257",

booktitle = "Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023",

address = "United States",

}

Li, Y, Liu, Z, Li, G, Chen, Q, Ding, Z, Hu, X & Hu, B 2023, A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images. in Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2023-July, IEEE Computer Society, pp. 252-257, 2023 IEEE International Conference on Multimedia and Expo, ICME 2023, Brisbane, Australia, 10/07/23. https://doi.org/10.1109/ICME55011.2023.00051

A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images. / Li, Yutong; Liu, Zhenyu; Li, Gang et al.
Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. IEEE Computer Society, 2023. p. 252-257 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2023-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images

AU - Li, Yutong

AU - Liu, Zhenyu

AU - Li, Gang

AU - Chen, Qiongqiong

AU - Ding, Zhijie

AU - Hu, Xiping

AU - Hu, Bin

PY - 2023

Y1 - 2023

N2 - The accuracy and availability are the most critical and challenging problems for major depressive disorder (MDD) diagnosis. Limited receptive field and inaccurate visual interpretation always weaken the clinical application of deep learning-based depression recognition model. Thus, we propose a visually interpretable depression monitoring model termed Transformer and Convolutional with slot-attention (TC-slot) to assess depression from facial images. Specifically, this approach stands upon the intersection of convolution and transformer, combines self-attention mechanism and deep convolution, and uses a well-designed stem structure to explore the global and local relationships. Moreover, in TC-slot, a classifier built on slot-attention mechanism directly involved in the decision-making process further localizes salient regions of facial depression patterns and provides precise and meaningful explanations. The results indicate that the proposed approach effectively improves the classification and recognition performance compared with other state-of-the-art approaches, with guaranteed favorable visual interpretability, providing clinical insights into the assessment of the assessing depression.

AB - The accuracy and availability are the most critical and challenging problems for major depressive disorder (MDD) diagnosis. Limited receptive field and inaccurate visual interpretation always weaken the clinical application of deep learning-based depression recognition model. Thus, we propose a visually interpretable depression monitoring model termed Transformer and Convolutional with slot-attention (TC-slot) to assess depression from facial images. Specifically, this approach stands upon the intersection of convolution and transformer, combines self-attention mechanism and deep convolution, and uses a well-designed stem structure to explore the global and local relationships. Moreover, in TC-slot, a classifier built on slot-attention mechanism directly involved in the decision-making process further localizes salient regions of facial depression patterns and provides precise and meaningful explanations. The results indicate that the proposed approach effectively improves the classification and recognition performance compared with other state-of-the-art approaches, with guaranteed favorable visual interpretability, providing clinical insights into the assessment of the assessing depression.

KW - Convolutional neural networks

KW - Depression

KW - Transformer

KW - Visual interpretability

UR - http://www.scopus.com/inward/record.url?scp=85171146306&partnerID=8YFLogxK

U2 - 10.1109/ICME55011.2023.00051

DO - 10.1109/ICME55011.2023.00051

M3 - Conference contribution

AN - SCOPUS:85171146306

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 252

EP - 257

BT - Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023

PB - IEEE Computer Society

T2 - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023

Y2 - 10 July 2023 through 14 July 2023

ER -

Li Y, Liu Z, Li G, Chen Q, Ding Z, Hu X et al. A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images. In Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023. IEEE Computer Society. 2023. p. 252-257. (Proceedings - IEEE International Conference on Multimedia and Expo). doi: 10.1109/ICME55011.2023.00051

A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this