A self-supervised model for language identification integrating phonological knowledge

Qingran Zhan; Xiang Xie; Chenguang Hu; Haobo Cheng

doi:10.3390/electronics10182259

A self-supervised model for language identification integrating phonological knowledge

Qingran Zhan, Xiang Xie^*, Chenguang Hu, Haobo Cheng

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

In this paper, a self-supervised learning pre-trained model is proposed and successfully applied in language identification task (LID). A Transformer encoder is employed and multi-task strategy is used to train the self-supervised model: the first task is to reconstruct the masking spans of input frames and the second task is a supervision task where the phoneme and phonological labels are used with Connectionist Temporal Classification (CTC) loss. By using this multi-task learning loss, the model is expected to capture high-level speech representation in phonological space. Meanwhile, an adaptive loss is also applied for multi-task learning to balance the weight between different tasks. After the pretraining stage, the self-supervised model is used for xvector systems. Our LID experiments are carried out on the oriental language recognition (OLR) challenge data corpus and 1 s, 3 s, Full-length test sets are selected. Experimental results show that on 1 s test set, feature extraction model approach can get best performance and in 3 s, Full-length test, the fine-tuning approach can reach the best performance. Furthermore, our results prove that the multi-task training strategy is effective and the proposed model can get the best performance.

Original language	English
Article number	2259
Journal	Electronics (Switzerland)
Volume	10
Issue number	18
DOIs	https://doi.org/10.3390/electronics10182259
Publication status	Published - Sept 2021

Keywords

Language identification
Phonological knowledge
Self-supervised learning

Access to Document

10.3390/electronics10182259

Cite this

Zhan, Q., Xie, X., Hu, C., & Cheng, H. (2021). A self-supervised model for language identification integrating phonological knowledge. Electronics (Switzerland), 10(18), Article 2259. https://doi.org/10.3390/electronics10182259

@article{aa13a2ed74ea4fb3bba1d57b89527cb9,

title = "A self-supervised model for language identification integrating phonological knowledge",

abstract = "In this paper, a self-supervised learning pre-trained model is proposed and successfully applied in language identification task (LID). A Transformer encoder is employed and multi-task strategy is used to train the self-supervised model: the first task is to reconstruct the masking spans of input frames and the second task is a supervision task where the phoneme and phonological labels are used with Connectionist Temporal Classification (CTC) loss. By using this multi-task learning loss, the model is expected to capture high-level speech representation in phonological space. Meanwhile, an adaptive loss is also applied for multi-task learning to balance the weight between different tasks. After the pretraining stage, the self-supervised model is used for xvector systems. Our LID experiments are carried out on the oriental language recognition (OLR) challenge data corpus and 1 s, 3 s, Full-length test sets are selected. Experimental results show that on 1 s test set, feature extraction model approach can get best performance and in 3 s, Full-length test, the fine-tuning approach can reach the best performance. Furthermore, our results prove that the multi-task training strategy is effective and the proposed model can get the best performance.",

keywords = "Language identification, Phonological knowledge, Self-supervised learning",

author = "Qingran Zhan and Xiang Xie and Chenguang Hu and Haobo Cheng",

note = "Publisher Copyright: {\textcopyright} 2021 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2021",

month = sep,

doi = "10.3390/electronics10182259",

language = "English",

volume = "10",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "18",

}

TY - JOUR

T1 - A self-supervised model for language identification integrating phonological knowledge

AU - Zhan, Qingran

AU - Xie, Xiang

AU - Hu, Chenguang

AU - Cheng, Haobo

PY - 2021/9

Y1 - 2021/9

N2 - In this paper, a self-supervised learning pre-trained model is proposed and successfully applied in language identification task (LID). A Transformer encoder is employed and multi-task strategy is used to train the self-supervised model: the first task is to reconstruct the masking spans of input frames and the second task is a supervision task where the phoneme and phonological labels are used with Connectionist Temporal Classification (CTC) loss. By using this multi-task learning loss, the model is expected to capture high-level speech representation in phonological space. Meanwhile, an adaptive loss is also applied for multi-task learning to balance the weight between different tasks. After the pretraining stage, the self-supervised model is used for xvector systems. Our LID experiments are carried out on the oriental language recognition (OLR) challenge data corpus and 1 s, 3 s, Full-length test sets are selected. Experimental results show that on 1 s test set, feature extraction model approach can get best performance and in 3 s, Full-length test, the fine-tuning approach can reach the best performance. Furthermore, our results prove that the multi-task training strategy is effective and the proposed model can get the best performance.

AB - In this paper, a self-supervised learning pre-trained model is proposed and successfully applied in language identification task (LID). A Transformer encoder is employed and multi-task strategy is used to train the self-supervised model: the first task is to reconstruct the masking spans of input frames and the second task is a supervision task where the phoneme and phonological labels are used with Connectionist Temporal Classification (CTC) loss. By using this multi-task learning loss, the model is expected to capture high-level speech representation in phonological space. Meanwhile, an adaptive loss is also applied for multi-task learning to balance the weight between different tasks. After the pretraining stage, the self-supervised model is used for xvector systems. Our LID experiments are carried out on the oriental language recognition (OLR) challenge data corpus and 1 s, 3 s, Full-length test sets are selected. Experimental results show that on 1 s test set, feature extraction model approach can get best performance and in 3 s, Full-length test, the fine-tuning approach can reach the best performance. Furthermore, our results prove that the multi-task training strategy is effective and the proposed model can get the best performance.

KW - Language identification

KW - Phonological knowledge

KW - Self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85114810958&partnerID=8YFLogxK

U2 - 10.3390/electronics10182259

DO - 10.3390/electronics10182259

M3 - Article

AN - SCOPUS:85114810958

SN - 2079-9292

VL - 10

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 18

M1 - 2259

ER -

A self-supervised model for language identification integrating phonological knowledge

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this