TY - GEN
T1 - Boosting legal case retrieval by query content selection with large language models
AU - Zhou, Youchao
AU - Huang, Heyan
AU - Wu, Zhijing
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/26
Y1 - 2023/11/26
N2 - Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing large language models (LLMs) to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.
AB - Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing large language models (LLMs) to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.
KW - Content selection
KW - Large language models
KW - Legal case retrieval
KW - Query reformulation
UR - http://www.scopus.com/inward/record.url?scp=85180130645&partnerID=8YFLogxK
U2 - 10.1145/3624918.3625328
DO - 10.1145/3624918.3625328
M3 - Conference contribution
AN - SCOPUS:85180130645
T3 - SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
SP - 176
EP - 184
BT - SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
PB - Association for Computing Machinery, Inc
T2 - 11th International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, SIGIR-AP 2023
Y2 - 26 November 2023 through 28 November 2023
ER -