Abstract
With the growing significance of non-intrusive speech quality assessment in speech systems, existing methods predominantly rely on neural networks to extract low-order features. Typically, these features undergo a low-dimensional linear transformation, yielding the network's output. However, the intercorrelation between feature points is often overlooked. In this paper, we explore the concept of kernel method, which maps features into high dimensional space through dot product, in order to enhance the extraction of relationships among all feature points. Considering the unique advantages of tensors in complex data representation, we extend the utilization of tensor network and propose a novel framework that incorporates a matrix product state (MPS) layer to predict mean opinion score (MOS). By integrating the MPS layer, our model can transform low-order features into higher-order representations, facilitating linear transformation in a high dimensional space without increasing the number of parameters. Furthermore, we propose a loss function that concurrently assesses regression and classification biases, along with correlation with real MOS labels. Experimental results demonstrate that our proposed model consistently outperforms the baseline system across all evaluation metrics and surpasses state-of-the-art models on the test set.
Original language | English |
---|---|
Pages (from-to) | 851-855 |
Number of pages | 5 |
Journal | Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing |
DOIs | |
Publication status | Published - 2024 |
Event | 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 |
Keywords
- matrix product state
- multi-task learning
- speech quality assessment
- tensor network