Bias-variance analysis in estimating true query model for information retrieval

Peng Zhang; Dawei Song; Jun Wang; Yuexian Hou

doi:10.1016/j.ipm.2013.08.004

Bias-variance analysis in estimating true query model for information retrieval

Peng Zhang, Dawei Song^*, Jun Wang, Yuexian Hou

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

13 引用（Scopus）

摘要

The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e.; the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.

源语言	英语
页（从-至）	199-217
页数	19
期刊	Information Processing and Management
卷	50
期	1
DOI	https://doi.org/10.1016/j.ipm.2013.08.004
出版状态	已出版 - 2014
已对外发布	是

访问文件

10.1016/j.ipm.2013.08.004

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, P., Song, D., Wang, J., & Hou, Y. (2014). Bias-variance analysis in estimating true query model for information retrieval. Information Processing and Management, 50(1), 199-217. https://doi.org/10.1016/j.ipm.2013.08.004

@article{77036d656a4f4c0180ab7302a436d472,

title = "Bias-variance analysis in estimating true query model for information retrieval",

abstract = "The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e.; the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.",

keywords = "Bias-variance, Information retrieval, Query language model",

author = "Peng Zhang and Dawei Song and Jun Wang and Yuexian Hou",

year = "2014",

doi = "10.1016/j.ipm.2013.08.004",

language = "English",

volume = "50",

pages = "199--217",

journal = "Information Processing and Management",

issn = "0306-4573",

publisher = "Elsevier Ltd.",

number = "1",

}

TY - JOUR

T1 - Bias-variance analysis in estimating true query model for information retrieval

AU - Zhang, Peng

AU - Song, Dawei

AU - Wang, Jun

AU - Hou, Yuexian

PY - 2014

Y1 - 2014

N2 - The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e.; the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.

AB - The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e.; the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.

KW - Bias-variance

KW - Information retrieval

KW - Query language model

UR - http://www.scopus.com/inward/record.url?scp=84885100338&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2013.08.004

DO - 10.1016/j.ipm.2013.08.004

M3 - Article

AN - SCOPUS:84885100338

SN - 0306-4573

VL - 50

SP - 199

EP - 217

JO - Information Processing and Management

JF - Information Processing and Management

IS - 1

ER -

Bias-variance analysis in estimating true query model for information retrieval

摘要

访问文件

其它文件与链接

指纹

引用此