Cross-Lingual Phrase Retrieval

Heqi Zheng; Xiao Zhang; Zewen Chi; Heyan Huang; Tan Yan; Tian Lan; Wei Wei; Xian Ling Mao

Cross-Lingual Phrase Retrieval

Heqi Zheng, Xiao Zhang, Zewen Chi, Heyan Huang, Tan Yan, Tian Lan, Wei Wei, Xian Ling Mao^*

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

4 引用（Scopus）

摘要

Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.

源语言	英语
主期刊名	ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
编辑	Smaranda Muresan, Preslav Nakov, Aline Villavicencio
出版商	Association for Computational Linguistics (ACL)
页	4193-4204
页数	12
ISBN（电子版）	9781955917216
出版状态	已出版 - 2022
活动	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, 爱尔兰期限: 22 5月 2022 → 27 5月 2022

出版系列

姓名	Proceedings of the Annual Meeting of the Association for Computational Linguistics
卷	1
ISSN（印刷版）	0736-587X

会议

会议	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
国家/地区	爱尔兰
市	Dublin
时期	22/05/22 → 27/05/22

其它文件与链接

链接到 Scopus 的出版物

引用此

Zheng, H., Zhang, X., Chi, Z., Huang, H., Yan, T., Lan, T., Wei, W., & Mao, X. L. (2022). Cross-Lingual Phrase Retrieval. 在 S. Muresan, P. Nakov, & A. Villavicencio (编辑), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (页码 4193-4204). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; 卷 1). Association for Computational Linguistics (ACL).

Zheng, Heqi ; Zhang, Xiao ; Chi, Zewen 等. / Cross-Lingual Phrase Retrieval. ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). 编辑 / Smaranda Muresan ; Preslav Nakov ; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. 页码 4193-4204 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

@inproceedings{fd7ca219ac084e02a8dc312bfeb99280,

title = "Cross-Lingual Phrase Retrieval",

abstract = "Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.",

author = "Heqi Zheng and Xiao Zhang and Zewen Chi and Heyan Huang and Tan Yan and Tian Lan and Wei Wei and Mao, {Xian Ling}",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

language = "English",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "4193--4204",

editor = "Smaranda Muresan and Preslav Nakov and Aline Villavicencio",

booktitle = "ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)",

address = "United States",

}

Zheng, H, Zhang, X, Chi, Z, Huang, H, Yan, T, Lan, T, Wei, W & Mao, XL 2022, Cross-Lingual Phrase Retrieval. 在 S Muresan, P Nakov & A Villavicencio (编辑), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Proceedings of the Annual Meeting of the Association for Computational Linguistics, 卷 1, Association for Computational Linguistics (ACL), 页码 4193-4204, 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022, Dublin, 爱尔兰, 22/05/22.

Cross-Lingual Phrase Retrieval. / Zheng, Heqi; Zhang, Xiao; Chi, Zewen 等.
ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). 编辑 / Smaranda Muresan; Preslav Nakov; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. 页码 4193-4204 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; 卷 1).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Cross-Lingual Phrase Retrieval

AU - Zheng, Heqi

AU - Zhang, Xiao

AU - Chi, Zewen

AU - Huang, Heyan

AU - Yan, Tan

AU - Lan, Tian

AU - Wei, Wei

AU - Mao, Xian Ling

PY - 2022

Y1 - 2022

N2 - Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.

AB - Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.

UR - http://www.scopus.com/inward/record.url?scp=85149129710&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85149129710

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 4193

EP - 4204

BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)

A2 - Muresan, Smaranda

A2 - Nakov, Preslav

A2 - Villavicencio, Aline

PB - Association for Computational Linguistics (ACL)

T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022

Y2 - 22 May 2022 through 27 May 2022

ER -

Zheng H, Zhang X, Chi Z, Huang H, Yan T, Lan T 等. Cross-Lingual Phrase Retrieval. 在 Muresan S, Nakov P, Villavicencio A, 编辑, ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL). 2022. 页码 4193-4204. (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Cross-Lingual Phrase Retrieval

摘要

出版系列

会议

其它文件与链接

指纹

引用此