Boosting legal case retrieval by query content selection with large language models

Youchao Zhou; Heyan Huang; Zhijing Wu

doi:10.1145/3624918.3625328

Boosting legal case retrieval by query content selection with large language models

Youchao Zhou, Heyan Huang^*, Zhijing Wu

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing large language models (LLMs) to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.

Original language	English
Title of host publication	SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
Publisher	Association for Computing Machinery, Inc
Pages	176-184
Number of pages	9
ISBN (Electronic)	9798400704086
DOIs	https://doi.org/10.1145/3624918.3625328
Publication status	Published - 26 Nov 2023
Event	11th International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, SIGIR-AP 2023 - Beijing, China Duration: 26 Nov 2023 → 28 Nov 2023

Publication series

Name	SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Conference

Conference	11th International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, SIGIR-AP 2023
Country/Territory	China
City	Beijing
Period	26/11/23 → 28/11/23

Keywords

Content selection
Large language models
Legal case retrieval
Query reformulation

Access to Document

10.1145/3624918.3625328

Cite this

Zhou, Y., Huang, H., & Wu, Z. (2023). Boosting legal case retrieval by query content selection with large language models. In SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (pp. 176-184). (SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region). Association for Computing Machinery, Inc. https://doi.org/10.1145/3624918.3625328

Zhou, Youchao ; Huang, Heyan ; Wu, Zhijing. / Boosting legal case retrieval by query content selection with large language models. SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. Association for Computing Machinery, Inc, 2023. pp. 176-184 (SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region).

@inproceedings{0f599fb2e8cd444b81ace56024a38d3c,

title = "Boosting legal case retrieval by query content selection with large language models",

abstract = "Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing large language models (LLMs) to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.",

keywords = "Content selection, Large language models, Legal case retrieval, Query reformulation",

author = "Youchao Zhou and Heyan Huang and Zhijing Wu",

note = "Publisher Copyright: {\textcopyright} 2023 ACM.; 11th International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, SIGIR-AP 2023 ; Conference date: 26-11-2023 Through 28-11-2023",

year = "2023",

month = nov,

day = "26",

doi = "10.1145/3624918.3625328",

language = "English",

series = "SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region",

publisher = "Association for Computing Machinery, Inc",

pages = "176--184",

booktitle = "SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region",

}

Zhou, Y, Huang, H & Wu, Z 2023, Boosting legal case retrieval by query content selection with large language models. in SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, Association for Computing Machinery, Inc, pp. 176-184, 11th International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, SIGIR-AP 2023, Beijing, China, 26/11/23. https://doi.org/10.1145/3624918.3625328

Boosting legal case retrieval by query content selection with large language models. / Zhou, Youchao; Huang, Heyan ; Wu, Zhijing.
SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. Association for Computing Machinery, Inc, 2023. p. 176-184 (SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Boosting legal case retrieval by query content selection with large language models

AU - Zhou, Youchao

AU - Huang, Heyan

AU - Wu, Zhijing

PY - 2023/11/26

Y1 - 2023/11/26

N2 - Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing large language models (LLMs) to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.

AB - Legal case retrieval, which aims to retrieve relevant cases to a given query case, benefits judgment justice and attracts increasing attention. Unlike generic retrieval queries, legal case queries are typically long and the definition of relevance is closely related to legal-specific elements. Therefore, legal case queries may suffer from noise and sparsity of salient content, which hinders retrieval models from perceiving correct information in a query. While previous studies have paid attention to improving retrieval models and understanding relevance judgments, we focus on enhancing legal case retrieval by utilizing the salient content in legal case queries. We first annotate the salient content in queries manually and investigate how sparse and dense retrieval models attend to those content. Then we experiment with various query content selection methods utilizing large language models (LLMs) to extract or summarize salient content and incorporate it into the retrieval models. Experimental results show that reformulating long queries using LLMs improves the performance of both sparse and dense models in legal case retrieval.

KW - Content selection

KW - Large language models

KW - Legal case retrieval

KW - Query reformulation

UR - http://www.scopus.com/inward/record.url?scp=85180130645&partnerID=8YFLogxK

U2 - 10.1145/3624918.3625328

DO - 10.1145/3624918.3625328

M3 - Conference contribution

AN - SCOPUS:85180130645

T3 - SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

SP - 176

EP - 184

BT - SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

PB - Association for Computing Machinery, Inc

T2 - 11th International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, SIGIR-AP 2023

Y2 - 26 November 2023 through 28 November 2023

ER -

Zhou Y, Huang H , Wu Z. Boosting legal case retrieval by query content selection with large language models. In SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region. Association for Computing Machinery, Inc. 2023. p. 176-184. (SIGIR-AP 2023 - Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region). doi: 10.1145/3624918.3625328

Boosting legal case retrieval by query content selection with large language models

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this