Facial Depression Estimation via Multi-Cue Contrastive Learning

Xinke Wang, Jingyuan Xu, Xiao Sun*, Mingzheng Li, Bin Hu, Wei Qian, Dan Guo*, Meng Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Vision-based depression estimation is an emerging yet impactful task, whose challenge lies in predicting the severity of depression from facial videos lasting at least several minutes. Existing methods primarily focus on fusing frame-level features to create comprehensive representations. However, they often overlook two crucial aspects: 1) inter- and intra-cue correlations, and 2) variations among samples. Hence, simply characterizing sample embeddings while ignoring to mine the relation among multiple cues leads to limitations. To address this problem, we propose a novel Multi-Cue Contrastive Learning (MCCL) framework to mine the relation among multiple cues for discriminative representation. Specifically, we first introduce a novel cross-characteristic attentive interaction module to model the relationship among multiple cues from four facial features (e.g., 3D landmarks, head poses, gazes, FAUs). Then, we propose a temporal segment attentive interaction module to capture the temporal relationships within each facial feature over time intervals. Moreover, we integrate contrastive learning to leverage the variations among samples by regarding the embeddings of inter-cue and intra-cue as positive pairs while considering embeddings from other samples as negative. In this way, the proposed MCCL framework leverages the relationships among the facial features and the variations among samples to enhance the process of multi-cue mining, thereby achieving more accurate facial depression estimation. Extensive experiments on public datasets, DAIC-WOZ, CMDC, and E-DAIC, demonstrate that our model not only outperforms the advanced depression methods but that the discriminative representations of facial behaviors provide potential insights about depression. Our code is available at:https://github.com/xkwangcn/MCCL.git.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • contrastive learning
  • Facial depression estimation
  • multi-cue

Fingerprint

Dive into the research topics of 'Facial Depression Estimation via Multi-Cue Contrastive Learning'. Together they form a unique fingerprint.

Cite this