TY - GEN
T1 - The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
AU - Guo, Bao
AU - Liu, Mengge
AU - Zhang, Wen
AU - Chen, Hexuan
AU - Mu, Chang
AU - Li, Xiang
AU - Cui, Jianwei
AU - Wang, Bin
AU - Guo, Yuhang
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - This system paper describes the Xiaomi Translation System for the IWSLT 2022 Simultaneous Speech Translation (noted as SST) shared task. We participate in the English-to-Mandarin Chinese Text-to-Text (noted as T2T) track. Our system is built based on the Transformer model with novel techniques borrowed from our recent research work. For the data filtering, language-model-based and rule-based methods are conducted to filter the data to obtain high-quality bilingual parallel corpora. We also strengthen our system with some dominating techniques related to data augmentation, such as knowledge distillation, tagged back-translation, and iterative back-translation. We also incorporate novel training techniques such as R-drop, deep model, and large batch training which have been shown to be beneficial to the naive Transformer model. In the SST scenario, several variations of wait-k strategies are explored. Furthermore, in terms of robustness, both data-based and model-based ways are used to reduce the sensitivity of our system to Automatic Speech Recognition (ASR) outputs. We finally design some inference algorithms and use the adaptive-ensemble method based on multiple model variants to further improve the performance of the system. Compared with strong baselines, fusing all techniques can improve our system by 2~3 BLEU scores under different latency regimes.
AB - This system paper describes the Xiaomi Translation System for the IWSLT 2022 Simultaneous Speech Translation (noted as SST) shared task. We participate in the English-to-Mandarin Chinese Text-to-Text (noted as T2T) track. Our system is built based on the Transformer model with novel techniques borrowed from our recent research work. For the data filtering, language-model-based and rule-based methods are conducted to filter the data to obtain high-quality bilingual parallel corpora. We also strengthen our system with some dominating techniques related to data augmentation, such as knowledge distillation, tagged back-translation, and iterative back-translation. We also incorporate novel training techniques such as R-drop, deep model, and large batch training which have been shown to be beneficial to the naive Transformer model. In the SST scenario, several variations of wait-k strategies are explored. Furthermore, in terms of robustness, both data-based and model-based ways are used to reduce the sensitivity of our system to Automatic Speech Recognition (ASR) outputs. We finally design some inference algorithms and use the adaptive-ensemble method based on multiple model variants to further improve the performance of the system. Compared with strong baselines, fusing all techniques can improve our system by 2~3 BLEU scores under different latency regimes.
UR - http://www.scopus.com/inward/record.url?scp=85137422039&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85137422039
T3 - IWSLT 2022 - 19th International Conference on Spoken Language Translation, Proceedings of the Conference
SP - 216
EP - 224
BT - IWSLT 2022 - 19th International Conference on Spoken Language Translation, Proceedings of the Conference
A2 - Salesky, Elizabeth
A2 - Federico, Marcello
A2 - Costa-Jussa, Marta
PB - Association for Computational Linguistics (ACL)
T2 - 19th International Conference on Spoken Language Translation, IWSLT 2022
Y2 - 26 May 2022 through 27 May 2022
ER -