Query in Your Tongue: Reinforce Large Language Models with Retrievers for Cross-lingual Search Generative Experience

Ping Guo, Yue Hu*, Yanan Cao, Yubing Ren, Yunpeng Li, Heyan Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

In the contemporary digital landscape, search engines play an invaluable role in information access, yet they often face challenges in Cross-Lingual Information Retrieval (CLIR). Though attempts are made to improve CLIR, current methods still leave users grappling with issues such as misplaced named entities and lost cultural context when querying in non-native languages. While some advances have been made using Neural Machine Translation models and cross-lingual representation, these are not without limitations. Enter the paradigm shift brought about by Large Language Models (LLMs), which have transformed search engines from simple retrievers to generators of contextually relevant information. This paper introduces the Multilingual Information Model for Intelligent Retrieval (MIMIR). Built on the power of LLMs, MIMIR directly responds in the language of the user's query, reducing the need for post-search translations. Our model's architecture encompasses a dual-module system: a retriever for searching multilingual documents and a responder for crafting answers in the user's desired language. Through a unique unified training framework, with the retriever serving as a reward model supervising the responder, and in turn, the responder producing synthetic data to refine the retriever's proficiency, MIMIR's retriever and responder iteratively enhance each other. Performance evaluations via CLEF and MKQA benchmarks reveal MIMIR's superiority over existing models, effectively addressing traditional CLIR challenges.

Original languageEnglish
Title of host publicationWWW 2024 - Proceedings of the ACM Web Conference
PublisherAssociation for Computing Machinery, Inc
Pages1529-1538
Number of pages10
ISBN (Electronic)9798400701719
DOIs
Publication statusPublished - 13 May 2024
Event33rd ACM Web Conference, WWW 2024 - Singapore, Singapore
Duration: 13 May 202417 May 2024

Publication series

NameWWW 2024 - Proceedings of the ACM Web Conference

Conference

Conference33rd ACM Web Conference, WWW 2024
Country/TerritorySingapore
CitySingapore
Period13/05/2417/05/24

Keywords

  • cross-lingual information retrieval
  • large language models
  • search generative experience

Cite this