Pseudo Label based Contrastive Sampling for Long Text Retrieval

Le Zhu, Shumin Shi*, Heyan Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Applying BERT to text retrieval brings great success recently, however, the internal limit of input length downgrades the performance on longer texts when using BERT. To address this issue, we split the long text into paragraphs as basic retrieval units. Then we explore several ways to calculate the pseudo labels for each query-paragraph pair: Inherit, BM25 and Vector inner product. With the annotated pseudo labels, contrastive sampling will be adopted to distinguish positive/negative examples to feed BERT for evaluating the relevance. Experiments show that our approach is effective on TREC 2020.

Original languageEnglish
Title of host publication2021 International Conference on Asian Language Processing, IALP 2021
EditorsDeyi Xiong, Ridong Jiang, Yanfeng Lu, Minghui Dong, Haizhou Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages95-98
Number of pages4
ISBN (Electronic)9781665483117
DOIs
Publication statusPublished - 2021
Event2021 International Conference on Asian Language Processing, IALP 2021 - Singapore, Singapore
Duration: 11 Dec 202113 Dec 2021

Publication series

Name2021 International Conference on Asian Language Processing, IALP 2021

Conference

Conference2021 International Conference on Asian Language Processing, IALP 2021
Country/TerritorySingapore
CitySingapore
Period11/12/2113/12/21

Keywords

  • BERT
  • Contrastive Sampling
  • Long Text Retrieval
  • Pretrained Language Model
  • Pseudo Label

Fingerprint

Dive into the research topics of 'Pseudo Label based Contrastive Sampling for Long Text Retrieval'. Together they form a unique fingerprint.

Cite this

Zhu, L., Shi, S., & Huang, H. (2021). Pseudo Label based Contrastive Sampling for Long Text Retrieval. In D. Xiong, R. Jiang, Y. Lu, M. Dong, & H. Li (Eds.), 2021 International Conference on Asian Language Processing, IALP 2021 (pp. 95-98). (2021 International Conference on Asian Language Processing, IALP 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IALP54817.2021.9675219