Cross-Lingual Phrase Retrieval

Heqi Zheng, Xiao Zhang, Zewen Chi, Heyan Huang, Tan Yan, Tian Lan, Wei Wei, Xian Ling Mao*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.

源语言英语
主期刊名ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
编辑Smaranda Muresan, Preslav Nakov, Aline Villavicencio
出版商Association for Computational Linguistics (ACL)
4193-4204
页数12
ISBN(电子版)9781955917216
出版状态已出版 - 2022
活动60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, 爱尔兰
期限: 22 5月 202227 5月 2022

出版系列

姓名Proceedings of the Annual Meeting of the Association for Computational Linguistics
1
ISSN(印刷版)0736-587X

会议

会议60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
国家/地区爱尔兰
Dublin
时期22/05/2227/05/22

指纹

探究 'Cross-Lingual Phrase Retrieval' 的科研主题。它们共同构成独一无二的指纹。

引用此