A bias–variance evaluation framework for information retrieval systems

Peng Zhang; Hui Gao; Zeting Hu; Meng Yang; Dawei Song; Jun Wang; Yuexian Hou; Bin Hu

doi:10.1016/j.ipm.2021.102747

A bias–variance evaluation framework for information retrieval systems

Peng Zhang^*, Hui Gao, Zeting Hu, Meng Yang, Dawei Song, Jun Wang, Yuexian Hou, Bin Hu

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.

Original language	English
Article number	102747
Journal	Information Processing and Management
Volume	59
Issue number	1
DOIs	https://doi.org/10.1016/j.ipm.2021.102747
Publication status	Published - Jan 2022

Keywords

Effectiveness–stability tradeoff
Evaluation metrics
Information retrieval

Access to Document

10.1016/j.ipm.2021.102747

Cite this

@article{4a80910e979145aaac648d7bddaefbdb,

title = "A bias–variance evaluation framework for information retrieval systems",

abstract = "In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.",

keywords = "Effectiveness–stability tradeoff, Evaluation metrics, Information retrieval",

author = "Peng Zhang and Hui Gao and Zeting Hu and Meng Yang and Dawei Song and Jun Wang and Yuexian Hou and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2021",

year = "2022",

month = jan,

doi = "10.1016/j.ipm.2021.102747",

language = "English",

volume = "59",

journal = "Information Processing and Management",

issn = "0306-4573",

publisher = "Elsevier Ltd.",

number = "1",

}

TY - JOUR

T1 - A bias–variance evaluation framework for information retrieval systems

AU - Zhang, Peng

AU - Gao, Hui

AU - Hu, Zeting

AU - Yang, Meng

AU - Song, Dawei

AU - Wang, Jun

AU - Hou, Yuexian

AU - Hu, Bin

PY - 2022/1

Y1 - 2022/1

N2 - In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.

AB - In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.

KW - Effectiveness–stability tradeoff

KW - Evaluation metrics

KW - Information retrieval

UR - http://www.scopus.com/inward/record.url?scp=85116514109&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2021.102747

DO - 10.1016/j.ipm.2021.102747

M3 - Article

AN - SCOPUS:85116514109

SN - 0306-4573

VL - 59

JO - Information Processing and Management

JF - Information Processing and Management

IS - 1

M1 - 102747

ER -

A bias–variance evaluation framework for information retrieval systems

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this