TY - GEN
T1 - A Visually Interpretable Convolutional-Transformer Model for Assessing Depression from Facial Images
AU - Li, Yutong
AU - Liu, Zhenyu
AU - Li, Gang
AU - Chen, Qiongqiong
AU - Ding, Zhijie
AU - Hu, Xiping
AU - Hu, Bin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The accuracy and availability are the most critical and challenging problems for major depressive disorder (MDD) diagnosis. Limited receptive field and inaccurate visual interpretation always weaken the clinical application of deep learning-based depression recognition model. Thus, we propose a visually interpretable depression monitoring model termed Transformer and Convolutional with slot-attention (TC-slot) to assess depression from facial images. Specifically, this approach stands upon the intersection of convolution and transformer, combines self-attention mechanism and deep convolution, and uses a well-designed stem structure to explore the global and local relationships. Moreover, in TC-slot, a classifier built on slot-attention mechanism directly involved in the decision-making process further localizes salient regions of facial depression patterns and provides precise and meaningful explanations. The results indicate that the proposed approach effectively improves the classification and recognition performance compared with other state-of-the-art approaches, with guaranteed favorable visual interpretability, providing clinical insights into the assessment of the assessing depression.
AB - The accuracy and availability are the most critical and challenging problems for major depressive disorder (MDD) diagnosis. Limited receptive field and inaccurate visual interpretation always weaken the clinical application of deep learning-based depression recognition model. Thus, we propose a visually interpretable depression monitoring model termed Transformer and Convolutional with slot-attention (TC-slot) to assess depression from facial images. Specifically, this approach stands upon the intersection of convolution and transformer, combines self-attention mechanism and deep convolution, and uses a well-designed stem structure to explore the global and local relationships. Moreover, in TC-slot, a classifier built on slot-attention mechanism directly involved in the decision-making process further localizes salient regions of facial depression patterns and provides precise and meaningful explanations. The results indicate that the proposed approach effectively improves the classification and recognition performance compared with other state-of-the-art approaches, with guaranteed favorable visual interpretability, providing clinical insights into the assessment of the assessing depression.
KW - Convolutional neural networks
KW - Depression
KW - Transformer
KW - Visual interpretability
UR - http://www.scopus.com/inward/record.url?scp=85171146306&partnerID=8YFLogxK
U2 - 10.1109/ICME55011.2023.00051
DO - 10.1109/ICME55011.2023.00051
M3 - Conference contribution
AN - SCOPUS:85171146306
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - 252
EP - 257
BT - Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PB - IEEE Computer Society
T2 - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Y2 - 10 July 2023 through 14 July 2023
ER -