Speech-based Depression Detection Using Unsupervised Autoencoder

Guangyao Sun; Shenghui Zhao; Bochao Zou; Yubo An

doi:10.1109/ICSIP55141.2022.9886372

Speech-based Depression Detection Using Unsupervised Autoencoder

Guangyao Sun, Shenghui Zhao, Bochao Zou^*, Yubo An

^*Corresponding author for this work

School of Information and Electronics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

9 Citations (Scopus)

Abstract

With the rapid development of society, over three hundred million people worldwide suffer from depression, which has become one of the most serious health problems in the world. As we know, depression detection is of great importance for its timely treatment. In this paper, a speech-based depression detection method using unsupervised autoencoder is proposed. Most previous methods encode the frame-level speech features into sentence-level features with statistical functions which lead to the loss of the temporal information between frames. To solve this, we propose an unsupervised network based on transformer. The unsupervised network is adopted to obtain the audio embedding vector of an audio segment from depressed or non-depressed people. Then the embedding audio vector is used for depression detection. The experimental results show that the proposed method achieves superior performance on both the English database DAIC and our self-built Chinese database CMDC.

Original language	English
Title of host publication	2022 7th International Conference on Signal and Image Processing, ICSIP 2022
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	35-38
Number of pages	4
ISBN (Electronic)	9781665495639
DOIs	https://doi.org/10.1109/ICSIP55141.2022.9886372
Publication status	Published - 2022
Event	7th International Conference on Signal and Image Processing, ICSIP 2022 - Suzhou, China Duration: 20 Jul 2022 → 22 Jul 2022

Publication series

Name	2022 7th International Conference on Signal and Image Processing, ICSIP 2022

Conference

Conference	7th International Conference on Signal and Image Processing, ICSIP 2022
Country/Territory	China
City	Suzhou
Period	20/07/22 → 22/07/22

Keywords

Chinese depression database
depression detection
transformer
unsupervised learning

Access to Document

10.1109/ICSIP55141.2022.9886372

Cite this

Sun, G., Zhao, S., Zou, B., & An, Y. (2022). Speech-based Depression Detection Using Unsupervised Autoencoder. In 2022 7th International Conference on Signal and Image Processing, ICSIP 2022 (pp. 35-38). (2022 7th International Conference on Signal and Image Processing, ICSIP 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSIP55141.2022.9886372

@inproceedings{26a8b3e32d0648c892c90144f834bada,

title = "Speech-based Depression Detection Using Unsupervised Autoencoder",

abstract = "With the rapid development of society, over three hundred million people worldwide suffer from depression, which has become one of the most serious health problems in the world. As we know, depression detection is of great importance for its timely treatment. In this paper, a speech-based depression detection method using unsupervised autoencoder is proposed. Most previous methods encode the frame-level speech features into sentence-level features with statistical functions which lead to the loss of the temporal information between frames. To solve this, we propose an unsupervised network based on transformer. The unsupervised network is adopted to obtain the audio embedding vector of an audio segment from depressed or non-depressed people. Then the embedding audio vector is used for depression detection. The experimental results show that the proposed method achieves superior performance on both the English database DAIC and our self-built Chinese database CMDC.",

keywords = "Chinese depression database, depression detection, transformer, unsupervised learning",

author = "Guangyao Sun and Shenghui Zhao and Bochao Zou and Yubo An",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 7th International Conference on Signal and Image Processing, ICSIP 2022 ; Conference date: 20-07-2022 Through 22-07-2022",

year = "2022",

doi = "10.1109/ICSIP55141.2022.9886372",

language = "English",

series = "2022 7th International Conference on Signal and Image Processing, ICSIP 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "35--38",

booktitle = "2022 7th International Conference on Signal and Image Processing, ICSIP 2022",

address = "United States",

}

Sun, G, Zhao, S, Zou, B & An, Y 2022, Speech-based Depression Detection Using Unsupervised Autoencoder. in 2022 7th International Conference on Signal and Image Processing, ICSIP 2022. 2022 7th International Conference on Signal and Image Processing, ICSIP 2022, Institute of Electrical and Electronics Engineers Inc., pp. 35-38, 7th International Conference on Signal and Image Processing, ICSIP 2022, Suzhou, China, 20/07/22. https://doi.org/10.1109/ICSIP55141.2022.9886372

Speech-based Depression Detection Using Unsupervised Autoencoder. / Sun, Guangyao; Zhao, Shenghui; Zou, Bochao et al.
2022 7th International Conference on Signal and Image Processing, ICSIP 2022. Institute of Electrical and Electronics Engineers Inc., 2022. p. 35-38 (2022 7th International Conference on Signal and Image Processing, ICSIP 2022).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Speech-based Depression Detection Using Unsupervised Autoencoder

AU - Sun, Guangyao

AU - Zhao, Shenghui

AU - Zou, Bochao

AU - An, Yubo

PY - 2022

Y1 - 2022

N2 - With the rapid development of society, over three hundred million people worldwide suffer from depression, which has become one of the most serious health problems in the world. As we know, depression detection is of great importance for its timely treatment. In this paper, a speech-based depression detection method using unsupervised autoencoder is proposed. Most previous methods encode the frame-level speech features into sentence-level features with statistical functions which lead to the loss of the temporal information between frames. To solve this, we propose an unsupervised network based on transformer. The unsupervised network is adopted to obtain the audio embedding vector of an audio segment from depressed or non-depressed people. Then the embedding audio vector is used for depression detection. The experimental results show that the proposed method achieves superior performance on both the English database DAIC and our self-built Chinese database CMDC.

AB - With the rapid development of society, over three hundred million people worldwide suffer from depression, which has become one of the most serious health problems in the world. As we know, depression detection is of great importance for its timely treatment. In this paper, a speech-based depression detection method using unsupervised autoencoder is proposed. Most previous methods encode the frame-level speech features into sentence-level features with statistical functions which lead to the loss of the temporal information between frames. To solve this, we propose an unsupervised network based on transformer. The unsupervised network is adopted to obtain the audio embedding vector of an audio segment from depressed or non-depressed people. Then the embedding audio vector is used for depression detection. The experimental results show that the proposed method achieves superior performance on both the English database DAIC and our self-built Chinese database CMDC.

KW - Chinese depression database

KW - depression detection

KW - transformer

KW - unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=85139448410&partnerID=8YFLogxK

U2 - 10.1109/ICSIP55141.2022.9886372

DO - 10.1109/ICSIP55141.2022.9886372

M3 - Conference contribution

AN - SCOPUS:85139448410

T3 - 2022 7th International Conference on Signal and Image Processing, ICSIP 2022

SP - 35

EP - 38

BT - 2022 7th International Conference on Signal and Image Processing, ICSIP 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 7th International Conference on Signal and Image Processing, ICSIP 2022

Y2 - 20 July 2022 through 22 July 2022

ER -

Speech-based Depression Detection Using Unsupervised Autoencoder

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this