跳到主要导航 跳到搜索 跳到主要内容

Hierarchical Dynamics Aggregation Network for Speech-based Depression Detection

  • Li Zhou
  • , Ling Li
  • , Rushi Lan
  • , Zhenyu Liu
  • , Xiaonan Luo*
  • , Bin Hu
  • *此作品的通讯作者
  • Guilin University of Electronic Technology
  • Lanzhou University

科研成果: 期刊稿件文章同行评审

摘要

Speech signals, owing to their non-invasive and low-cost advantages, have emerged as a pivotal modality for the objective assessment of depression. However, existing methods struggle to capture the hierarchical dynamic structures of speech, thereby constraining the discriminability of the representations. To address this issue, this paper proposes a Hierarchical Dynamics Aggregation Network (HDAN). Under the hierarchical dynamic modeling paradigm, the model first constructs context-aware first-order acoustic state representations. On this basis, a Multi-Scale Dynamics Extraction (MDE) module and a Dynamic Relation Network (DRN) are introduced to extract and aggregate speech dynamics across multiple temporal scales, forming a unified global dynamic representation. Then, a Dynamic Synergistic Memory (DSM) module is employed to align and enhance sample-level dynamics with learnable prototypes. Finally, a Mask-based Cross Fusion (MCF) module is used to adaptively fuse global dynamics and content semantics, obtaining a joint representation that accounts for both content and dynamics. Comparative experiments on the Androids Corpus and Clinical Dataset demonstrate that HDAN consistently outperforms multiple baseline models on various metrics, validating the effectiveness of HDAN. Meanwhile, ablation studies show that each submodule contributes positively to performance improvements, further supporting the rationality of its structural design.

源语言英语
期刊IEEE Transactions on Affective Computing
DOI
出版状态已接受/待刊 - 2026
已对外发布

指纹

探究 'Hierarchical Dynamics Aggregation Network for Speech-based Depression Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此