Domain-adversarial based model with phonological knowledge for cross-lingual speech recognition

Qingran Zhan; Xiang Xie; Chenguang Hu; Juan Zuluaga-Gomez; Jing Wang; Haobo Cheng

doi:10.3390/electronics10243172

Domain-adversarial based model with phonological knowledge for cross-lingual speech recognition

Qingran Zhan, Xiang Xie^*, Chenguang Hu, Juan Zuluaga-Gomez, Jing Wang, Haobo Cheng

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for crosslingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multistream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.

Original language	English
Article number	3172
Journal	Electronics (Switzerland)
Volume	10
Issue number	24
DOIs	https://doi.org/10.3390/electronics10243172
Publication status	Published - 1 Dec 2021

Keywords

Articulatory features
Cross-lingual automatic speech recognition (ASR)
Domain-adversarial neural network
Multi-stream learning

Access to Document

10.3390/electronics10243172

Cite this

@article{14ede7ae046d499d877de9a9a7f09789,

title = "Domain-adversarial based model with phonological knowledge for cross-lingual speech recognition",

abstract = "Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for crosslingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multistream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.",

keywords = "Articulatory features, Cross-lingual automatic speech recognition (ASR), Domain-adversarial neural network, Multi-stream learning",

author = "Qingran Zhan and Xiang Xie and Chenguang Hu and Juan Zuluaga-Gomez and Jing Wang and Haobo Cheng",

note = "Publisher Copyright: {\textcopyright} 2021 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2021",

month = dec,

day = "1",

doi = "10.3390/electronics10243172",

language = "English",

volume = "10",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "24",

}

TY - JOUR

T1 - Domain-adversarial based model with phonological knowledge for cross-lingual speech recognition

AU - Zhan, Qingran

AU - Xie, Xiang

AU - Hu, Chenguang

AU - Zuluaga-Gomez, Juan

AU - Wang, Jing

AU - Cheng, Haobo

PY - 2021/12/1

Y1 - 2021/12/1

N2 - Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for crosslingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multistream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.

AB - Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for crosslingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multistream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.

KW - Articulatory features

KW - Cross-lingual automatic speech recognition (ASR)

KW - Domain-adversarial neural network

KW - Multi-stream learning

UR - http://www.scopus.com/inward/record.url?scp=85121395775&partnerID=8YFLogxK

U2 - 10.3390/electronics10243172

DO - 10.3390/electronics10243172

M3 - Article

AN - SCOPUS:85121395775

SN - 2079-9292

VL - 10

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 24

M1 - 3172

ER -

Domain-adversarial based model with phonological knowledge for cross-lingual speech recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this