Partially Fake Audio Detection Based on MOSNet with Pretraining Models

Hanyue Liu, Jianqian Zhang, Jing Wang*, Miao Liu, Liang Xu, Yi Sun

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the rapid development of speech synthesis and voice conversion related technologies, many potential risks have been brought to people's information security and privacy. Therefore, it is important to build techniques to identify manipulated regions in audios. In this paper, we propose a novel partially fake audio detection system based on MOSNet, a speech quality assessment network, and pretraining models. Comparisions between features extracted by pretraining models and Mel-spectrogram are made. Experimental results show that the proposed system combining MOSNet and XLS-R-300m pretraining model has the best performance on both evaluation set and test set, and has good generalization ability. The final score of the proposed system on test set is 5.97% higher than that of the baseline system based on RawNet.

Original languageEnglish
Title of host publicationProceedings of 2023 7th Asian Conference on Artificial Intelligence Technology, ACAIT 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages899-903
Number of pages5
ISBN (Electronic)9798350359145
DOIs
Publication statusPublished - 2023
Event7th Asian Conference on Artificial Intelligence Technology, ACAIT 2023 - Quzhou, China
Duration: 10 Nov 202312 Nov 2023

Publication series

NameProceedings of 2023 7th Asian Conference on Artificial Intelligence Technology, ACAIT 2023

Conference

Conference7th Asian Conference on Artificial Intelligence Technology, ACAIT 2023
Country/TerritoryChina
CityQuzhou
Period10/11/2312/11/23

Keywords

  • deep learning
  • manipulation region location
  • partially fake audio detection
  • pretraining models

Cite this