TY - GEN
T1 - Unsupervised Large Language Model Alignment for Information Retrieval via Contrastive Feedback
AU - Dong, Qian
AU - Liu, Yiding
AU - Ai, Qingyao
AU - Wu, Zhijing
AU - Li, Haitao
AU - Liu, Yiqun
AU - Wang, Shuaiqiang
AU - Yin, Dawei
AU - Ma, Shaoping
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/7/10
Y1 - 2024/7/10
N2 - Large language models (LLMs) have demonstrated remarkable capabilities across various research domains, including the field of Information Retrieval (IR). However, the responses generated by off-the-shelf LLMs tend to be generic, i.e., cannot capture the distinctiveness of each document with similar content. This limits the performance of LLMs in IR because finding and distinguishing relevant documents from substantial similar documents is a typical problem in many IR tasks. To address this issue, we propose an unsupervised alignment method, namely Reinforcement Learning from Contrastive Feedback (RLCF), empowering LLMs to generate both high-quality and context-specific responses. Our approach constructs unsupervised contrastive feedback signals based on similar document groups, and adopts a reward function, named group-wise reciprocal rank, to optimize LLMs. We conduct extensive experiments to evaluate the effectiveness of RLCF.
AB - Large language models (LLMs) have demonstrated remarkable capabilities across various research domains, including the field of Information Retrieval (IR). However, the responses generated by off-the-shelf LLMs tend to be generic, i.e., cannot capture the distinctiveness of each document with similar content. This limits the performance of LLMs in IR because finding and distinguishing relevant documents from substantial similar documents is a typical problem in many IR tasks. To address this issue, we propose an unsupervised alignment method, namely Reinforcement Learning from Contrastive Feedback (RLCF), empowering LLMs to generate both high-quality and context-specific responses. Our approach constructs unsupervised contrastive feedback signals based on similar document groups, and adopts a reward function, named group-wise reciprocal rank, to optimize LLMs. We conduct extensive experiments to evaluate the effectiveness of RLCF.
KW - alignment
KW - information retrieval
KW - large language models
UR - http://www.scopus.com/inward/record.url?scp=85200566667&partnerID=8YFLogxK
U2 - 10.1145/3626772.3657689
DO - 10.1145/3626772.3657689
M3 - Conference contribution
AN - SCOPUS:85200566667
T3 - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 48
EP - 58
BT - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Y2 - 14 July 2024 through 18 July 2024
ER -