Steering Large Language Models for Cross-lingual Information Retrieval

Ping Guo, Yubing Ren, Yue Hu*, Yanan Cao, Yunpeng Li, Heyan Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

In today's digital age, accessing information across language barriers poses a significant challenge, with conventional search systems often struggling to interpret and retrieve multilingual content accurately. Addressing this issue, our study introduces a novel integration of applying Large Language Models (LLMs) as Cross-lingual Readers in information retrieval systems, specifically targeting the complexities of cross-lingual information retrieval (CLIR). We present an innovative approach: Activation Steered Multilingual Retrieval (ASMR) that employs "steering activations''-a method to adjust and direct the LLM's focus-enhancing its ability to understand user queries and generate accurate, language-coherent responses. ASMR adeptly combines a Multilingual Dense Passage Retrieval (mDPR) system with an LLM, overcoming the limitations of traditional search engines in handling diverse linguistic inputs. This approach is particularly effective in managing the nuances and intricacies inherent in various languages. Rigorous testing on established benchmarks such as XOR-TyDi QA, and MKQA demonstrates that ASMR not only meets but surpasses existing standards in CLIR, achieving state-of-the-art performance. The results of our research hold significant implications for understanding the inherent features of how LLMs understand and generate natural languages, offering an attempt towards more inclusive, effective, and linguistically diverse information access on a global scale.

Original languageEnglish
Title of host publicationSIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages585-596
Number of pages12
ISBN (Electronic)9798400704314
DOIs
Publication statusPublished - 11 Jul 2024
Event47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024 - Washington, United States
Duration: 14 Jul 202418 Jul 2024

Publication series

NameSIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Country/TerritoryUnited States
CityWashington
Period14/07/2418/07/24

Keywords

  • activation steering
  • cross-lingual information retrieval
  • large language models

Cite this