Abstract
Recently, with the widespread popularity of the Internet, social networks have become an indispensable part of people's lives. As social networks contain information about users' daily moods and states, their development provides a new avenue for detecting depression. Although most current approaches focus on the fusion of multimodal features, the importance of fine-grained behavioral information is ignored. In this paper, we propose the Joint Attention Multi-Scale Fusion Network (JAMFN), a model that reflects the multiscale behavioral information of depression and leverages the proposed Joint Attention Fusion (JAF) module to extract the temporal importance of multiple modalities to guide the fusion of multiscale modal pairs. Our experiment is conducted on D-vlog dataset, and the experimental results demonstrate that the proposed JAMFN model outperforms all the benchmark models, indicating that our proposed JAMFN model can effectively mine the potential depressive behavior.
Original language | English |
---|---|
Pages (from-to) | 3417-3421 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2023-August |
DOIs | |
Publication status | Published - 2023 |
Externally published | Yes |
Event | 24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023 |
Keywords
- Depression detection
- Joint Attention Multi-Scale Fusion Network (JAMFN)
- Vlog