Generative Dense Retrieval: Memory Can Be a Burden

Peiwen Yuan, Xinglin Wang, Shaoxiong Feng, Boyuan Pan, Yiwei Li, Heda Wang, Xupeng Miao, Kan Li*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Generative Retrieval (GR), autoregressively decoding relevant document identifiers given a query, has been shown to perform well under the setting of small-scale corpora. By memorizing the document corpus with model parameters, GR implicitly achieves deep interaction between query and document. However, such a memorizing mechanism faces three drawbacks: (1) Poor memory accuracy for fine-grained features of documents; (2) Memory confusion gets worse as the corpus size increases; (3) Huge memory update costs for new documents. To alleviate these problems, we propose the Generative Dense Retrieval (GDR) paradigm. Specifically, GDR first uses the limited memory volume to achieve inter-cluster matching from query to relevant document clusters. Memorizing-free matching mechanism from Dense Retrieval (DR) is then introduced to conduct fine-grained intra-cluster matching from clusters to relevant documents. The coarse-to-fine process maximizes the advantages of GR's deep interaction and DR's scalability. Besides, we design a cluster identifier constructing strategy to facilitate corpus memory and a cluster-adaptive negative sampling strategy to enhance the intra-cluster mapping ability. Empirical results show that GDR obtains an average of 3.0 R@100 improvement on NQ dataset under multiple settings and has better scalability.

源语言英语
主期刊名EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
编辑Yvette Graham, Matthew Purver, Matthew Purver
出版商Association for Computational Linguistics (ACL)
2835-2845
页数11
ISBN(电子版)9798891760882
出版状态已出版 - 2024
活动18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - St. Julian�s, 马耳他
期限: 17 3月 202422 3月 2024

出版系列

姓名EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
1

会议

会议18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024
国家/地区马耳他
St. Julian�s
时期17/03/2422/03/24

指纹

探究 'Generative Dense Retrieval: Memory Can Be a Burden' 的科研主题。它们共同构成独一无二的指纹。

引用此