TY - JOUR
T1 - Depressive semantic awareness from vlog facial and vocal streams via spatio-temporal transformer
AU - Tao, Yongfeng
AU - Yang, Minqiang
AU - Wu, Yushan
AU - Lee, Kevin
AU - Kline, Adrienne
AU - Hu, Bin
N1 - Publisher Copyright:
© 2023 Chongqing University of Posts and Telecommunications
PY - 2024/6
Y1 - 2024/6
N2 - With the rapid growth of information transmission via the Internet, efforts have been made to reduce network load to promote efficiency. One such application is semantic computing, which can extract and process semantic communication. Social media has enabled users to share their current emotions, opinions, and life events through their mobile devices. Notably, people suffering from mental health problems are more willing to share their feelings on social networks. Therefore, it is necessary to extract semantic information from social media (vlog data) to identify abnormal emotional states to facilitate early identification and intervention. Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression. To solve this problem, this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression. First, a module with spatio-temporal data is embedded into the transformer encoder, which is utilized to obtain a representation of spatio-temporal features. Second, a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effectively. Experiments are conducted on the D-Vlog dataset. The results show that the method is effective, and the accuracy rate can reach 70.70%. This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.
AB - With the rapid growth of information transmission via the Internet, efforts have been made to reduce network load to promote efficiency. One such application is semantic computing, which can extract and process semantic communication. Social media has enabled users to share their current emotions, opinions, and life events through their mobile devices. Notably, people suffering from mental health problems are more willing to share their feelings on social networks. Therefore, it is necessary to extract semantic information from social media (vlog data) to identify abnormal emotional states to facilitate early identification and intervention. Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression. To solve this problem, this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression. First, a module with spatio-temporal data is embedded into the transformer encoder, which is utilized to obtain a representation of spatio-temporal features. Second, a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effectively. Experiments are conducted on the D-Vlog dataset. The results show that the method is effective, and the accuracy rate can reach 70.70%. This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.
KW - Depression recognition
KW - Emotional computing
KW - Semantic awareness
KW - Vlog data
UR - http://www.scopus.com/inward/record.url?scp=85196836048&partnerID=8YFLogxK
U2 - 10.1016/j.dcan.2023.03.007
DO - 10.1016/j.dcan.2023.03.007
M3 - Article
AN - SCOPUS:85196836048
SN - 2468-5925
VL - 10
SP - 577
EP - 585
JO - Digital Communications and Networks
JF - Digital Communications and Networks
IS - 3
ER -