TY - GEN
T1 - Bias-variance decomposition of IR evaluation
AU - Zhang, Peng
AU - Song, Dawei
AU - Wang, Jun
AU - Hou, Yuexian
PY - 2013
Y1 - 2013
N2 - It has been recognized that, when an information retrieval (IR) system achieves improvement in mean retrieval effectiveness (e.g. mean average precision (MAP)) over all the queries, the performance (e.g., average precision (AP)) of some individual queries could be hurt, resulting in retrieval instability. Some stability/robustness metrics have been proposed. However, they are often defined separately from the mean effectiveness metric. Consequently, there is a lack of a unified formulation of effectiveness, stability and overall retrieval quality (considering both). In this paper, we present a unified formulation based on the bias-variance decomposition. Correspondingly, a novel evaluation methodology is developed to evaluate the effectiveness and stability in an integrated manner. A case study applying the proposed methodology to evaluation of query language modeling illustrates the usefulness and analytical power of our approach.
AB - It has been recognized that, when an information retrieval (IR) system achieves improvement in mean retrieval effectiveness (e.g. mean average precision (MAP)) over all the queries, the performance (e.g., average precision (AP)) of some individual queries could be hurt, resulting in retrieval instability. Some stability/robustness metrics have been proposed. However, they are often defined separately from the mean effectiveness metric. Consequently, there is a lack of a unified formulation of effectiveness, stability and overall retrieval quality (considering both). In this paper, we present a unified formulation based on the bias-variance decomposition. Correspondingly, a novel evaluation methodology is developed to evaluate the effectiveness and stability in an integrated manner. A case study applying the proposed methodology to evaluation of query language modeling illustrates the usefulness and analytical power of our approach.
KW - Bias-Variance
KW - Decomposition
KW - Effectiveness
KW - Evaluation
KW - Robustness
KW - Stability
UR - http://www.scopus.com/inward/record.url?scp=84883130585&partnerID=8YFLogxK
U2 - 10.1145/2484028.2484127
DO - 10.1145/2484028.2484127
M3 - Conference contribution
AN - SCOPUS:84883130585
SN - 9781450320344
T3 - SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 1021
EP - 1024
BT - SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
T2 - 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
Y2 - 28 July 2013 through 1 August 2013
ER -