A bias–variance evaluation framework for information retrieval systems

Peng Zhang; Hui Gao; Zeting Hu; Meng Yang; Dawei Song; Jun Wang; Yuexian Hou; Bin Hu

doi:10.1016/j.ipm.2021.102747

A bias–variance evaluation framework for information retrieval systems

Peng Zhang^*, Hui Gao, Zeting Hu, Meng Yang, Dawei Song, Jun Wang, Yuexian Hou, Bin Hu

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.

源语言	英语
文章编号	102747
期刊	Information Processing and Management
卷	59
期	1
DOI	https://doi.org/10.1016/j.ipm.2021.102747
出版状态	已出版 - 1月 2022

访问文件

10.1016/j.ipm.2021.102747

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, P., Gao, H., Hu, Z., Yang, M., Song, D., Wang, J., Hou, Y., & Hu, B. (2022). A bias–variance evaluation framework for information retrieval systems. Information Processing and Management, 59(1), 文章 102747. https://doi.org/10.1016/j.ipm.2021.102747

@article{4a80910e979145aaac648d7bddaefbdb,

title = "A bias–variance evaluation framework for information retrieval systems",

abstract = "In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.",

keywords = "Effectiveness–stability tradeoff, Evaluation metrics, Information retrieval",

author = "Peng Zhang and Hui Gao and Zeting Hu and Meng Yang and Dawei Song and Jun Wang and Yuexian Hou and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2021",

year = "2022",

month = jan,

doi = "10.1016/j.ipm.2021.102747",

language = "English",

volume = "59",

journal = "Information Processing and Management",

issn = "0306-4573",

publisher = "Elsevier Ltd.",

number = "1",

}

TY - JOUR

T1 - A bias–variance evaluation framework for information retrieval systems

AU - Zhang, Peng

AU - Gao, Hui

AU - Hu, Zeting

AU - Yang, Meng

AU - Song, Dawei

AU - Wang, Jun

AU - Hou, Yuexian

AU - Hu, Bin

PY - 2022/1

Y1 - 2022/1

N2 - In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.

AB - In information retrieval (IR), the improvement of the effectiveness often sacrifices the stability of an IR system. To evaluate the stability, many risk-sensitive metrics have been proposed. Since the theoretical limitations, the current works study the effectiveness and stability separately, and have not explored the effectiveness–stability tradeoff. In this paper, we propose a Bias–Variance Tradeoff Evaluation (BV-Test) framework, based on the bias–variance decomposition of the mean squared error, to measure the overall performance (considering both effectiveness and stability) and the tradeoff between effectiveness and stability of a system. In this framework, we define generalized bias–variance metrics, based on the Cranfield-style experiment set-up where the document collection is fixed (across topics) or the set-up where document collection is a sample (per-topic). Compared with risk-sensitive evaluation methods, our work not only measures the effectiveness–stability tradeoff of a system, but also effectively tracks the source of system instability. Experiments on TREC Ad-hoc track (1993–1999) and Web track (2010–2014) show a clear effectiveness–stability tradeoff across topics and per-topic, and topic grouping and max–min normalization can effectively reduce the bias–variance tradeoff. Experimental results on TREC Session track (2010–2012) also show that the query reformulation and increase of user data are beneficial to both effectiveness and stability simultaneously.

KW - Effectiveness–stability tradeoff

KW - Evaluation metrics

KW - Information retrieval

UR - http://www.scopus.com/inward/record.url?scp=85116514109&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2021.102747

DO - 10.1016/j.ipm.2021.102747

M3 - Article

AN - SCOPUS:85116514109

SN - 0306-4573

VL - 59

JO - Information Processing and Management

JF - Information Processing and Management

IS - 1

M1 - 102747

ER -

A bias–variance evaluation framework for information retrieval systems

摘要

访问文件

其它文件与链接

指纹

引用此