TY - JOUR
T1 - Bias-variance analysis in estimating true query model for information retrieval
AU - Zhang, Peng
AU - Song, Dawei
AU - Wang, Jun
AU - Hou, Yuexian
PY - 2014
Y1 - 2014
N2 - The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e.; the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.
AB - The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e.; the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.
KW - Bias-variance
KW - Information retrieval
KW - Query language model
UR - http://www.scopus.com/inward/record.url?scp=84885100338&partnerID=8YFLogxK
U2 - 10.1016/j.ipm.2013.08.004
DO - 10.1016/j.ipm.2013.08.004
M3 - Article
AN - SCOPUS:84885100338
SN - 0306-4573
VL - 50
SP - 199
EP - 217
JO - Information Processing and Management
JF - Information Processing and Management
IS - 1
ER -