mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

Zewen Chi*, Li Dong, Shuming Ma, Shaohan Huang, Xian Ling Mao, Heyan Huang, Furu Wei

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

36 Citations (Scopus)

Abstract

Multilingual T5 (MT5; Xue et al. 2020) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks. In this paper, we improve multilingual text-to-text transfer Transformer with translation pairs (MT6). Specifically, we explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption. In addition, we propose a partially non-autoregressive objective for text-to-text pretraining. We evaluate the methods on eight multilingual benchmark datasets, including sentence classification, named entity recognition, question answering, and abstractive summarization. Experimental results show that the proposed MT6 improves cross-lingual transferability over MT5.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages1671-1683
Number of pages13
ISBN (Electronic)9781955917094
Publication statusPublished - 2021
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityVirtual, Punta Cana
Period7/11/2111/11/21

Fingerprint

Dive into the research topics of 'mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs'. Together they form a unique fingerprint.

Cite this