Neural Network Model Based on the Tensor Network for Audio Tagging of Domestic Activities

Li Dong Yang; Ren Bo Yue; Jing Wang; Min Liu

doi:10.3389/fphy.2022.863291

Neural Network Model Based on the Tensor Network for Audio Tagging of Domestic Activities

Li Dong Yang, Ren Bo Yue, Jing Wang^*, Min Liu

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Due to the serious problem of population aging, monitoring of domestic activities is increasingly important. Audio tagging of domestic activities is very suitable when the visual data are unavailable due to the interference from light and the environment. Aiming at solving this problem, a neural network model based on the tensor network is proposed for audio tagging of domestic activities that is more interpretable than traditional neural networks. The introduction of the tensor network can compress the network parameters and reduce the redundancy of the training model while maintaining a good performance. First, the important features of a Mel spectrogram of the input audio are extracted through the convolutional neural networks (CNNs). Then, they are converted into the high-order space corresponding with the tensor network. The spatial structure information and important features can be further extracted and retained through the matrix product state (MPS). Large patches of the featured data are divided into small local orderless patches when using the tensor network. The final tagging results are obtained through the MPS layers which is just a tensor network structure based on the tensor train decomposition. In order to evaluate the proposed method, the DCASE 2018 challenge task 5 dataset for monitoring domestic activities is selected. The results showed that the average F1-score of the proposed model in the test set of the development dataset and validation dataset reached 87.7 and 85.9%, which are 3.2 and 2.8% higher than the baseline system, respectively. It is verified that the proposed model can perform better and more efficiently for audio tagging of domestic activities.

源语言	英语
文章编号	863291
期刊	Frontiers in Physics
卷	10
DOI	https://doi.org/10.3389/fphy.2022.863291
出版状态	已出版 - 12 4月 2022

访问文件

10.3389/fphy.2022.863291

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9d25aa3440e54ee9a5b8e647a0d4f561,

title = "Neural Network Model Based on the Tensor Network for Audio Tagging of Domestic Activities",

abstract = "Due to the serious problem of population aging, monitoring of domestic activities is increasingly important. Audio tagging of domestic activities is very suitable when the visual data are unavailable due to the interference from light and the environment. Aiming at solving this problem, a neural network model based on the tensor network is proposed for audio tagging of domestic activities that is more interpretable than traditional neural networks. The introduction of the tensor network can compress the network parameters and reduce the redundancy of the training model while maintaining a good performance. First, the important features of a Mel spectrogram of the input audio are extracted through the convolutional neural networks (CNNs). Then, they are converted into the high-order space corresponding with the tensor network. The spatial structure information and important features can be further extracted and retained through the matrix product state (MPS). Large patches of the featured data are divided into small local orderless patches when using the tensor network. The final tagging results are obtained through the MPS layers which is just a tensor network structure based on the tensor train decomposition. In order to evaluate the proposed method, the DCASE 2018 challenge task 5 dataset for monitoring domestic activities is selected. The results showed that the average F1-score of the proposed model in the test set of the development dataset and validation dataset reached 87.7 and 85.9%, which are 3.2 and 2.8% higher than the baseline system, respectively. It is verified that the proposed model can perform better and more efficiently for audio tagging of domestic activities.",

keywords = "audio tagging, matrix product state (MPS), neural network, tensor network, tensor train decomposition",

author = "Yang, {Li Dong} and Yue, {Ren Bo} and Jing Wang and Min Liu",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 Yang, Yue, Wang and Liu.",

year = "2022",

month = apr,

day = "12",

doi = "10.3389/fphy.2022.863291",

language = "English",

volume = "10",

journal = "Frontiers in Physics",

issn = "2296-424X",

publisher = "Frontiers Media SA",

}

TY - JOUR

T1 - Neural Network Model Based on the Tensor Network for Audio Tagging of Domestic Activities

AU - Yang, Li Dong

AU - Yue, Ren Bo

AU - Wang, Jing

AU - Liu, Min

PY - 2022/4/12

Y1 - 2022/4/12

N2 - Due to the serious problem of population aging, monitoring of domestic activities is increasingly important. Audio tagging of domestic activities is very suitable when the visual data are unavailable due to the interference from light and the environment. Aiming at solving this problem, a neural network model based on the tensor network is proposed for audio tagging of domestic activities that is more interpretable than traditional neural networks. The introduction of the tensor network can compress the network parameters and reduce the redundancy of the training model while maintaining a good performance. First, the important features of a Mel spectrogram of the input audio are extracted through the convolutional neural networks (CNNs). Then, they are converted into the high-order space corresponding with the tensor network. The spatial structure information and important features can be further extracted and retained through the matrix product state (MPS). Large patches of the featured data are divided into small local orderless patches when using the tensor network. The final tagging results are obtained through the MPS layers which is just a tensor network structure based on the tensor train decomposition. In order to evaluate the proposed method, the DCASE 2018 challenge task 5 dataset for monitoring domestic activities is selected. The results showed that the average F1-score of the proposed model in the test set of the development dataset and validation dataset reached 87.7 and 85.9%, which are 3.2 and 2.8% higher than the baseline system, respectively. It is verified that the proposed model can perform better and more efficiently for audio tagging of domestic activities.

AB - Due to the serious problem of population aging, monitoring of domestic activities is increasingly important. Audio tagging of domestic activities is very suitable when the visual data are unavailable due to the interference from light and the environment. Aiming at solving this problem, a neural network model based on the tensor network is proposed for audio tagging of domestic activities that is more interpretable than traditional neural networks. The introduction of the tensor network can compress the network parameters and reduce the redundancy of the training model while maintaining a good performance. First, the important features of a Mel spectrogram of the input audio are extracted through the convolutional neural networks (CNNs). Then, they are converted into the high-order space corresponding with the tensor network. The spatial structure information and important features can be further extracted and retained through the matrix product state (MPS). Large patches of the featured data are divided into small local orderless patches when using the tensor network. The final tagging results are obtained through the MPS layers which is just a tensor network structure based on the tensor train decomposition. In order to evaluate the proposed method, the DCASE 2018 challenge task 5 dataset for monitoring domestic activities is selected. The results showed that the average F1-score of the proposed model in the test set of the development dataset and validation dataset reached 87.7 and 85.9%, which are 3.2 and 2.8% higher than the baseline system, respectively. It is verified that the proposed model can perform better and more efficiently for audio tagging of domestic activities.

KW - audio tagging

KW - matrix product state (MPS)

KW - neural network

KW - tensor network

KW - tensor train decomposition

UR - http://www.scopus.com/inward/record.url?scp=85128730596&partnerID=8YFLogxK

U2 - 10.3389/fphy.2022.863291

DO - 10.3389/fphy.2022.863291

M3 - Article

AN - SCOPUS:85128730596

SN - 2296-424X

VL - 10

JO - Frontiers in Physics

JF - Frontiers in Physics

M1 - 863291

ER -

Neural Network Model Based on the Tensor Network for Audio Tagging of Domestic Activities

摘要

访问文件

其它文件与链接

指纹

引用此