A Timbre Attribute Discrimination System Fusing Pre-trained Speaker Feature Extractors with Gender Prior Features

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents the system submitted to Track 1 of the Voice Timbre Attribute Detection (vTAD) 2025 Challenge. The core objective of the vTAD challenge is to address the intensity comparison task, which requires determining the relative strength of timbre attributes between two speech signals in dimensions of human perception. The system utilizes pre-trained speaker representations and gender representations as front-end inputs, and employs a residual neural network to output the intensity comparison results of speech pairs under specific descriptors. The system ultimately secured third place on the Seen track of the vTAD 2025 Challenge, achieving an accuracy of 95. 38% and an equal error rate (EER) of 4. 98%.

Original languageEnglish
Title of host publicationMan-Machine Speech Communication - 20th National Conference, NCMMSC 2025, Proceedings
EditorsJia Jia, Zhiyong Wu, Lijian Gao, Gongping Huang, Ya Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages470-481
Number of pages12
ISBN (Print)9789819553815
DOIs
Publication statusPublished - 2026
Event20th National Conference on Man-Machine Speech Communication, NCMMSC 2025 - Zhenjiang, China
Duration: 16 Oct 202519 Oct 2025

Publication series

NameCommunications in Computer and Information Science
Volume2662 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference20th National Conference on Man-Machine Speech Communication, NCMMSC 2025
Country/TerritoryChina
CityZhenjiang
Period16/10/2519/10/25

Keywords

  • Speaker Embedding
  • Voice Analysis
  • vTAD

Fingerprint

Dive into the research topics of 'A Timbre Attribute Discrimination System Fusing Pre-trained Speaker Feature Extractors with Gender Prior Features'. Together they form a unique fingerprint.

Cite this