Improving speech recognition models with small samples for air traffic control systems

Yi Lin; Qin Li; Bo Yang; Zhen Yan; Huachun Tan; Zhengmao Chen

doi:10.1016/j.neucom.2020.08.092

Improving speech recognition models with small samples for air traffic control systems

Yi Lin, Qin Li, Bo Yang, Zhen Yan, Huachun Tan^*, Zhengmao Chen

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

34 Citations (Scopus)

Abstract

In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert- and domain-dependent task. In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved end-to-end deep learning model is developed to address the specific challenges of ASR in the ATC domain. An unsupervised pretraining strategy is first proposed to learn speech representations from unlabeled samples for a certain dataset. Specifically, a masking strategy is applied to improve the diversity of the sample without losing their general patterns. Subsequently, transfer learning is applied to fine-tune a pretrained or other optimized baseline models to finally achieves the supervised ASR task. By virtue of the common terminology used in the ATC domain, the transfer learning task can be regarded as a sub-domain adaption task, in which the transferred model is optimized using a joint corpus consisting of baseline samples and new transcribed samples from the target dataset. This joint corpus construction strategy enriches the size and diversity of the training samples, which is important for addressing the issue of the small transcribed corpus. In addition, speed perturbation is applied to augment the new transcribed samples to further improve the quality of the speech corpus. Three real ATC datasets are used to validate the proposed ASR model and training strategies. The experimental results demonstrate that the ASR performance is significantly improved on all three datasets, with an absolute character error rate only one-third of that achieved through the supervised training. The applicability of the proposed strategies to other ASR approaches is also validated.

Original language	English
Pages (from-to)	287-297
Number of pages	11
Journal	Neurocomputing
Volume	445
DOIs	https://doi.org/10.1016/j.neucom.2020.08.092
Publication status	Published - 20 Jul 2021
Externally published	Yes

Keywords

Air traffic control system
Automatic speech recognition
Deep learning
Pretraining
Small training samples
Transfer learning

Access to Document

10.1016/j.neucom.2020.08.092

Cite this

@article{7bde5c4783514e40a23a1fb13ca7e2f3,

title = "Improving speech recognition models with small samples for air traffic control systems",

abstract = "In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert- and domain-dependent task. In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved end-to-end deep learning model is developed to address the specific challenges of ASR in the ATC domain. An unsupervised pretraining strategy is first proposed to learn speech representations from unlabeled samples for a certain dataset. Specifically, a masking strategy is applied to improve the diversity of the sample without losing their general patterns. Subsequently, transfer learning is applied to fine-tune a pretrained or other optimized baseline models to finally achieves the supervised ASR task. By virtue of the common terminology used in the ATC domain, the transfer learning task can be regarded as a sub-domain adaption task, in which the transferred model is optimized using a joint corpus consisting of baseline samples and new transcribed samples from the target dataset. This joint corpus construction strategy enriches the size and diversity of the training samples, which is important for addressing the issue of the small transcribed corpus. In addition, speed perturbation is applied to augment the new transcribed samples to further improve the quality of the speech corpus. Three real ATC datasets are used to validate the proposed ASR model and training strategies. The experimental results demonstrate that the ASR performance is significantly improved on all three datasets, with an absolute character error rate only one-third of that achieved through the supervised training. The applicability of the proposed strategies to other ASR approaches is also validated.",

keywords = "Air traffic control system, Automatic speech recognition, Deep learning, Pretraining, Small training samples, Transfer learning",

author = "Yi Lin and Qin Li and Bo Yang and Zhen Yan and Huachun Tan and Zhengmao Chen",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = jul,

day = "20",

doi = "10.1016/j.neucom.2020.08.092",

language = "English",

volume = "445",

pages = "287--297",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Improving speech recognition models with small samples for air traffic control systems

AU - Lin, Yi

AU - Li, Qin

AU - Yang, Bo

AU - Yan, Zhen

AU - Tan, Huachun

AU - Chen, Zhengmao

PY - 2021/7/20

Y1 - 2021/7/20

N2 - In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert- and domain-dependent task. In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved end-to-end deep learning model is developed to address the specific challenges of ASR in the ATC domain. An unsupervised pretraining strategy is first proposed to learn speech representations from unlabeled samples for a certain dataset. Specifically, a masking strategy is applied to improve the diversity of the sample without losing their general patterns. Subsequently, transfer learning is applied to fine-tune a pretrained or other optimized baseline models to finally achieves the supervised ASR task. By virtue of the common terminology used in the ATC domain, the transfer learning task can be regarded as a sub-domain adaption task, in which the transferred model is optimized using a joint corpus consisting of baseline samples and new transcribed samples from the target dataset. This joint corpus construction strategy enriches the size and diversity of the training samples, which is important for addressing the issue of the small transcribed corpus. In addition, speed perturbation is applied to augment the new transcribed samples to further improve the quality of the speech corpus. Three real ATC datasets are used to validate the proposed ASR model and training strategies. The experimental results demonstrate that the ASR performance is significantly improved on all three datasets, with an absolute character error rate only one-third of that achieved through the supervised training. The applicability of the proposed strategies to other ASR approaches is also validated.

AB - In the domain of air traffic control (ATC) systems, efforts to train a practical automatic speech recognition (ASR) model always faces the problem of small training samples since the collection and annotation of speech samples are expert- and domain-dependent task. In this work, a novel training approach based on pretraining and transfer learning is proposed to address this issue, and an improved end-to-end deep learning model is developed to address the specific challenges of ASR in the ATC domain. An unsupervised pretraining strategy is first proposed to learn speech representations from unlabeled samples for a certain dataset. Specifically, a masking strategy is applied to improve the diversity of the sample without losing their general patterns. Subsequently, transfer learning is applied to fine-tune a pretrained or other optimized baseline models to finally achieves the supervised ASR task. By virtue of the common terminology used in the ATC domain, the transfer learning task can be regarded as a sub-domain adaption task, in which the transferred model is optimized using a joint corpus consisting of baseline samples and new transcribed samples from the target dataset. This joint corpus construction strategy enriches the size and diversity of the training samples, which is important for addressing the issue of the small transcribed corpus. In addition, speed perturbation is applied to augment the new transcribed samples to further improve the quality of the speech corpus. Three real ATC datasets are used to validate the proposed ASR model and training strategies. The experimental results demonstrate that the ASR performance is significantly improved on all three datasets, with an absolute character error rate only one-third of that achieved through the supervised training. The applicability of the proposed strategies to other ASR approaches is also validated.

KW - Air traffic control system

KW - Automatic speech recognition

KW - Deep learning

KW - Pretraining

KW - Small training samples

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85103652677&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.08.092

DO - 10.1016/j.neucom.2020.08.092

M3 - Article

AN - SCOPUS:85103652677

SN - 0925-2312

VL - 445

SP - 287

EP - 297

JO - Neurocomputing

JF - Neurocomputing

ER -

Improving speech recognition models with small samples for air traffic control systems

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this