Abstract
Applying BERT to text retrieval brings great success recently, however, the internal limit of input length downgrades the performance on longer texts when using BERT. To address this issue, we split the long text into paragraphs as basic retrieval units. Then we explore several ways to calculate the pseudo labels for each query-paragraph pair: Inherit, BM25 and Vector inner product. With the annotated pseudo labels, contrastive sampling will be adopted to distinguish positive/negative examples to feed BERT for evaluating the relevance. Experiments show that our approach is effective on TREC 2020.
Original language | English |
---|---|
Title of host publication | 2021 International Conference on Asian Language Processing, IALP 2021 |
Editors | Deyi Xiong, Ridong Jiang, Yanfeng Lu, Minghui Dong, Haizhou Li |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 95-98 |
Number of pages | 4 |
ISBN (Electronic) | 9781665483117 |
DOIs | |
Publication status | Published - 2021 |
Event | 2021 International Conference on Asian Language Processing, IALP 2021 - Singapore, Singapore Duration: 11 Dec 2021 → 13 Dec 2021 |
Publication series
Name | 2021 International Conference on Asian Language Processing, IALP 2021 |
---|
Conference
Conference | 2021 International Conference on Asian Language Processing, IALP 2021 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 11/12/21 → 13/12/21 |
Keywords
- BERT
- Contrastive Sampling
- Long Text Retrieval
- Pretrained Language Model
- Pseudo Label
Fingerprint
Dive into the research topics of 'Pseudo Label based Contrastive Sampling for Long Text Retrieval'. Together they form a unique fingerprint.Cite this
Zhu, L., Shi, S., & Huang, H. (2021). Pseudo Label based Contrastive Sampling for Long Text Retrieval. In D. Xiong, R. Jiang, Y. Lu, M. Dong, & H. Li (Eds.), 2021 International Conference on Asian Language Processing, IALP 2021 (pp. 95-98). (2021 International Conference on Asian Language Processing, IALP 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IALP54817.2021.9675219