Deep Scalogram Representations for Acoustic Scene Classification

Zhao Ren; Kun Qian; Yebin Wang; Zixing Zhang; Vedhas Pandit; Alice Baird; Bjorn Schuller

doi:10.1109/JAS.2018.7511066

Deep Scalogram Representations for Acoustic Scene Classification

Zhao Ren^*, Kun Qian, Yebin Wang, Zixing Zhang, Vedhas Pandit, Alice Baird, Bjorn Schuller

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

102 引用（Scopus）

摘要

Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-Trained convolutional neural networks; thirdly, the features extracted from a subsequent fully connected layer are fed into U+0028 bidirectional U+0029 gated recurrent neural networks, which are followed by a single highway layer and a softmax layer; finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events U+0028 DCASE U+0029. On the evaluation set, an accuracy of 64.0 U+0025 from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 U+0025 baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy, when fusing with a spectrogram-based system.

源语言	英语
页（从-至）	662-669
页数	8
期刊	IEEE/CAA Journal of Automatica Sinica
卷	5
期	3
DOI	https://doi.org/10.1109/JAS.2018.7511066
出版状态	已出版 - 5月 2018
已对外发布	是

访问文件

10.1109/JAS.2018.7511066

其它文件与链接

链接到 Scopus 的出版物

引用此

Ren, Z., Qian, K., Wang, Y., Zhang, Z., Pandit, V., Baird, A., & Schuller, B. (2018). Deep Scalogram Representations for Acoustic Scene Classification. IEEE/CAA Journal of Automatica Sinica, 5(3), 662-669. https://doi.org/10.1109/JAS.2018.7511066

@article{9e2e1d74f2d64ac6a53c465d39b4a75a,

title = "Deep Scalogram Representations for Acoustic Scene Classification",

abstract = "Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-Trained convolutional neural networks; thirdly, the features extracted from a subsequent fully connected layer are fed into U+0028 bidirectional U+0029 gated recurrent neural networks, which are followed by a single highway layer and a softmax layer; finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events U+0028 DCASE U+0029. On the evaluation set, an accuracy of 64.0 U+0025 from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 U+0025 baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy, when fusing with a spectrogram-based system.",

keywords = "(bidirectional) gated recurrent neural networks ((B) GRNNs), Acoustic scene classification (ASC), convolutional neural networks (CNNs), deep scalogram representation, spectrogram representation",

author = "Zhao Ren and Kun Qian and Yebin Wang and Zixing Zhang and Vedhas Pandit and Alice Baird and Bjorn Schuller",

note = "Publisher Copyright: {\textcopyright} 2014 Chinese Association of Automation.",

year = "2018",

month = may,

doi = "10.1109/JAS.2018.7511066",

language = "English",

volume = "5",

pages = "662--669",

journal = "IEEE/CAA Journal of Automatica Sinica",

issn = "2329-9266",

publisher = "IEEE Advancing Technology for Humanity",

number = "3",

}

TY - JOUR

T1 - Deep Scalogram Representations for Acoustic Scene Classification

AU - Ren, Zhao

AU - Qian, Kun

AU - Wang, Yebin

AU - Zhang, Zixing

AU - Pandit, Vedhas

AU - Baird, Alice

AU - Schuller, Bjorn

PY - 2018/5

Y1 - 2018/5

N2 - Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-Trained convolutional neural networks; thirdly, the features extracted from a subsequent fully connected layer are fed into U+0028 bidirectional U+0029 gated recurrent neural networks, which are followed by a single highway layer and a softmax layer; finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events U+0028 DCASE U+0029. On the evaluation set, an accuracy of 64.0 U+0025 from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 U+0025 baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy, when fusing with a spectrogram-based system.

AB - Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-Trained convolutional neural networks; thirdly, the features extracted from a subsequent fully connected layer are fed into U+0028 bidirectional U+0029 gated recurrent neural networks, which are followed by a single highway layer and a softmax layer; finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events U+0028 DCASE U+0029. On the evaluation set, an accuracy of 64.0 U+0025 from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 U+0025 baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy, when fusing with a spectrogram-based system.

KW - (bidirectional) gated recurrent neural networks ((B) GRNNs)

KW - Acoustic scene classification (ASC)

KW - convolutional neural networks (CNNs)

KW - deep scalogram representation

KW - spectrogram representation

UR - http://www.scopus.com/inward/record.url?scp=85045329590&partnerID=8YFLogxK

U2 - 10.1109/JAS.2018.7511066

DO - 10.1109/JAS.2018.7511066

M3 - Article

AN - SCOPUS:85045329590

SN - 2329-9266

VL - 5

SP - 662

EP - 669

JO - IEEE/CAA Journal of Automatica Sinica

JF - IEEE/CAA Journal of Automatica Sinica

IS - 3

ER -

Deep Scalogram Representations for Acoustic Scene Classification

摘要

访问文件

其它文件与链接

指纹

引用此