TY - GEN
T1 - Learning Higher Representations from Bioacoustics
T2 - 27th International Conference on Neural Information Processing, ICONIP 2020
AU - Qiao, Yu
AU - Qian, Kun
AU - Zhao, Ziping
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - In the past two decades, a plethora of efforts have been given to the field of automatic classification of bird sounds, which can facilitate a long-term, non-human, and low-energy consumption ubiquitous computing system for monitoring the nature reserve. Nevertheless, human hand-crafted features need numerous domain knowledge, and inevitably make the designing progress time-consuming and expensive. To this line, we propose a sequence-to-sequence deep learning approach for extracting the higher representations automatically from bird sounds without any human expert knowledge. First, we transform the birds sound audio into spectrograms. Subsequently, higher representations were learnt by an autoencoder-based encoder-decoder paradigm combined with the deep recurrent neural networks. Finally, two typical machine learning models are selected to predict the classes, i.e., support vector machines and multi-layer perceptrons. Experimental results demonstrate the effectiveness of the method proposed, which can reach an unweighted average recall (UAR) at 66.8% in recognising 86 species of birds.
AB - In the past two decades, a plethora of efforts have been given to the field of automatic classification of bird sounds, which can facilitate a long-term, non-human, and low-energy consumption ubiquitous computing system for monitoring the nature reserve. Nevertheless, human hand-crafted features need numerous domain knowledge, and inevitably make the designing progress time-consuming and expensive. To this line, we propose a sequence-to-sequence deep learning approach for extracting the higher representations automatically from bird sounds without any human expert knowledge. First, we transform the birds sound audio into spectrograms. Subsequently, higher representations were learnt by an autoencoder-based encoder-decoder paradigm combined with the deep recurrent neural networks. Finally, two typical machine learning models are selected to predict the classes, i.e., support vector machines and multi-layer perceptrons. Experimental results demonstrate the effectiveness of the method proposed, which can reach an unweighted average recall (UAR) at 66.8% in recognising 86 species of birds.
KW - Bioacoustics
KW - Bird sound classification
KW - Deep learning
KW - Internet of Things
KW - Sequence-to-sequence learning
UR - http://www.scopus.com/inward/record.url?scp=85097090322&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63823-8_16
DO - 10.1007/978-3-030-63823-8_16
M3 - Conference contribution
AN - SCOPUS:85097090322
SN - 9783030638221
T3 - Communications in Computer and Information Science
SP - 130
EP - 138
BT - Neural Information Processing - 27th International Conference, ICONIP 2020, Proceedings
A2 - Yang, Haiqin
A2 - Pasupa, Kitsuchart
A2 - Leung, Andrew Chi-Sing
A2 - Kwok, James T.
A2 - Chan, Jonathan H.
A2 - King, Irwin
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 18 November 2020 through 22 November 2020
ER -