Pseudo Label based Contrastive Sampling for Long Text Retrieval

Le Zhu, Shumin Shi*, Heyan Huang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Applying BERT to text retrieval brings great success recently, however, the internal limit of input length downgrades the performance on longer texts when using BERT. To address this issue, we split the long text into paragraphs as basic retrieval units. Then we explore several ways to calculate the pseudo labels for each query-paragraph pair: Inherit, BM25 and Vector inner product. With the annotated pseudo labels, contrastive sampling will be adopted to distinguish positive/negative examples to feed BERT for evaluating the relevance. Experiments show that our approach is effective on TREC 2020.

源语言英语
主期刊名2021 International Conference on Asian Language Processing, IALP 2021
编辑Deyi Xiong, Ridong Jiang, Yanfeng Lu, Minghui Dong, Haizhou Li
出版商Institute of Electrical and Electronics Engineers Inc.
95-98
页数4
ISBN(电子版)9781665483117
DOI
出版状态已出版 - 2021
活动2021 International Conference on Asian Language Processing, IALP 2021 - Singapore, 新加坡
期限: 11 12月 202113 12月 2021

出版系列

姓名2021 International Conference on Asian Language Processing, IALP 2021

会议

会议2021 International Conference on Asian Language Processing, IALP 2021
国家/地区新加坡
Singapore
时期11/12/2113/12/21

指纹

探究 'Pseudo Label based Contrastive Sampling for Long Text Retrieval' 的科研主题。它们共同构成独一无二的指纹。

引用此