Skip to main navigation Skip to search Skip to main content

Towards stable cross-domain depression recognition under missing modalities

  • Jiuyi Chen
  • , Mingkui Tan
  • , Haifeng Lu
  • , Qiuna Xu
  • , Zhihua Wang
  • , Runhao Zeng*
  • , Xiping Hu
  • *Corresponding author for this work
  • South China University of Technology
  • Peng Cheng Laboratory
  • Shenzhen MSU-BIT University
  • City University of Hong Kong
  • Beijing Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Depression poses serious public health risks, including suicide, underscoring the urgency of timely and scalable screening. Multimodal automatic depression detection (ADD) offers a promising solution; however, widely studied audio- and video-based ADD methods lack a unified, generalizable framework for diverse depression recognition scenarios and show limited stability to missing modalities, which are common in real-world data. In this work, we propose a unified framework for Stable Cross-Domain Depression Recognition based on Multimodal Large Language Model (SCD-MLLM). The framework supports the integration and processing of heterogeneous depression-related data collected from varied sources while maintaining stability in the presence of incomplete modality inputs. Specifically, SCD-MLLM introduces two key components: (i) Multi-Source Data Input Adapter (MDIA), which employs masking mechanism and task-specific prompts to transform heterogeneous depression-related inputs into uniform token sequences, addressing inconsistency across diverse data sources; (ii) Modality-Aware Adaptive Fusion Module (MAFM), which adaptively integrates audio and visual features via a shared projection mechanism, enhancing resilience under missing modality conditions. We conduct comprehensive experiments under multi-dataset joint training settings on five publicly available and heterogeneous depression datasets from diverse scenarios: CMDC, AVEC2014, DAIC-WOZ, DVlog, and EATD. Across both complete and partial modality settings, SCD-MLLM outperforms state-of-the-art (SOTA) models as well as leading commercial LLMs (Gemini and GPT), demonstrating superior cross-domain generalization, enhanced ability to capture multimodal cues of depression, and strong stability to missing modality cases in real-world applications.

Original languageEnglish
Article number113367
JournalPattern Recognition
Volume177
DOIs
Publication statusPublished - Sept 2026
Externally publishedYes

Keywords

  • Affective computing
  • Depression recognition
  • Missing modality
  • Multimodal large language model

Fingerprint

Dive into the research topics of 'Towards stable cross-domain depression recognition under missing modalities'. Together they form a unique fingerprint.

Cite this