TY - JOUR
T1 - Anxiety recognition based on multimodal social media data and cross-attention mechanism
AU - Zhu, Jianghong
AU - Zhang, Zhenwen
AU - Li, Zepeng
AU - Hu, Bin
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/9/14
Y1 - 2025/9/14
N2 - Anxiety disorder is a common mental illness that involves persistent and recurrent episodes of intense anxiety and sudden feelings of fear or terror, which seriously affect the patient's study, work and life. Social media contains a large amount of multimodal data reflecting people's inner activities and emotional states, providing a new way for recognizing anxiety disorders. However, existing methods tend to excessively focus on the text data, and neglect the role of other types of data such as images. To address this issue, this paper proposes a Multimodal Anxiety Recognition model (MAR-IDCA) based on Image Description and Cross-Attention mechanism. Firstly, for each image of the user's post, caption generation model is applied to obtain the abstract semantic information of the image, and the possible character information in the image is obtained through optical character recognition technology, so as to generate the description of the image. Secondly, through an image encoder and two text encoders, the visual feature of the image, the textual features of the image description and the post text are extracted respectively. Then, a multi-head cross-attention mechanism is employed to fuse the visual feature, image description feature and post text feature. Finally, combined with other auxiliary information of user posts, the final multimodal data representation is obtained and classified. Experiments on two multimodal anxiety recognition datasets show that, compared with the existing models, MAR-IDCA has better anxiety recognition performance.
AB - Anxiety disorder is a common mental illness that involves persistent and recurrent episodes of intense anxiety and sudden feelings of fear or terror, which seriously affect the patient's study, work and life. Social media contains a large amount of multimodal data reflecting people's inner activities and emotional states, providing a new way for recognizing anxiety disorders. However, existing methods tend to excessively focus on the text data, and neglect the role of other types of data such as images. To address this issue, this paper proposes a Multimodal Anxiety Recognition model (MAR-IDCA) based on Image Description and Cross-Attention mechanism. Firstly, for each image of the user's post, caption generation model is applied to obtain the abstract semantic information of the image, and the possible character information in the image is obtained through optical character recognition technology, so as to generate the description of the image. Secondly, through an image encoder and two text encoders, the visual feature of the image, the textual features of the image description and the post text are extracted respectively. Then, a multi-head cross-attention mechanism is employed to fuse the visual feature, image description feature and post text feature. Finally, combined with other auxiliary information of user posts, the final multimodal data representation is obtained and classified. Experiments on two multimodal anxiety recognition datasets show that, compared with the existing models, MAR-IDCA has better anxiety recognition performance.
KW - Anxiety recognition
KW - Cross-attention mechanism
KW - Image description
KW - Multimodal data
KW - Social media
UR - http://www.scopus.com/inward/record.url?scp=105006656740&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2025.130473
DO - 10.1016/j.neucom.2025.130473
M3 - Article
AN - SCOPUS:105006656740
SN - 0925-2312
VL - 646
JO - Neurocomputing
JF - Neurocomputing
M1 - 130473
ER -