Generalized bias-variance evaluation of TREC participated systems

Peng Zhang; Linxue Hao; Dawei Song; Jun Wang; Yuexian Hou; Bin Hu

doi:10.1145/2661829.2661934

Generalized bias-variance evaluation of TREC participated systems

Peng Zhang, Linxue Hao, Dawei Song, Jun Wang, Yuexian Hou, Bin Hu

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

6 引用（Scopus）

摘要

Recent research has shown that the improvement of mean retrieval effectiveness (e.g., MAP) may sacrifice the retrieval stability across queries, implying a tradeoff between effectiveness and stability. The evaluation of both effectiveness and stability are often based on a baseline model, which could be weak or biased. In addition, the effectiveness-stability tradeoff has not been systematically or quantitatively evaluated over TREC participated systems. The above two problems, to some extent, limit our awareness of such tradeoff and its impact on developing future IR models. In this paper, motivated by a recently proposed bias-variance based evaluation, we adopt a strong and unbiased "baseline", which is a virtual target model constructed by the best performance (for each query) among all the participated systems in a retrieval task. We also propose generalized bias-variance metrics, based on which a systematic and quantitative evaluation of the effectiveness-stability tradeoff is carried out over the participated systems in the TREC Ad-hoc Track (1993-1999) and Web Track (2010-2012). We observe a clear effectiveness-stability tradeoff, with a trend of becoming more obvious in more recent years. This implies that when we pursue more effective IR systems over years, the stability has become problematic and could have been largely overlooked.

源语言	英语
主期刊名	CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
出版商	Association for Computing Machinery
页	1911-1914
页数	4
ISBN（电子版）	9781450325981
DOI	https://doi.org/10.1145/2661829.2661934
出版状态	已出版 - 3 11月 2014
已对外发布	是
活动	23rd ACM International Conference on Information and Knowledge Management, CIKM 2014 - Shanghai, 中国期限: 3 11月 2014 → 7 11月 2014

出版系列

姓名	CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

会议

会议	23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
国家/地区	中国
市	Shanghai
时期	3/11/14 → 7/11/14

访问文件

10.1145/2661829.2661934

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, P., Hao, L., Song, D., Wang, J., Hou, Y., & Hu, B. (2014). Generalized bias-variance evaluation of TREC participated systems. 在 CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management (页码 1911-1914). (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management). Association for Computing Machinery. https://doi.org/10.1145/2661829.2661934

Zhang, Peng ; Hao, Linxue ; Song, Dawei 等. / Generalized bias-variance evaluation of TREC participated systems. CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2014. 页码 1911-1914 (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management).

@inproceedings{eebad92985b045d1947114c499856edf,

title = "Generalized bias-variance evaluation of TREC participated systems",

abstract = "Recent research has shown that the improvement of mean retrieval effectiveness (e.g., MAP) may sacrifice the retrieval stability across queries, implying a tradeoff between effectiveness and stability. The evaluation of both effectiveness and stability are often based on a baseline model, which could be weak or biased. In addition, the effectiveness-stability tradeoff has not been systematically or quantitatively evaluated over TREC participated systems. The above two problems, to some extent, limit our awareness of such tradeoff and its impact on developing future IR models. In this paper, motivated by a recently proposed bias-variance based evaluation, we adopt a strong and unbiased {"}baseline{"}, which is a virtual target model constructed by the best performance (for each query) among all the participated systems in a retrieval task. We also propose generalized bias-variance metrics, based on which a systematic and quantitative evaluation of the effectiveness-stability tradeoff is carried out over the participated systems in the TREC Ad-hoc Track (1993-1999) and Web Track (2010-2012). We observe a clear effectiveness-stability tradeoff, with a trend of becoming more obvious in more recent years. This implies that when we pursue more effective IR systems over years, the stability has become problematic and could have been largely overlooked.",

keywords = "Biasvariance tradeoff, Effectiveness-stability tradeoff, Evaluation, Virtual target model",

author = "Peng Zhang and Linxue Hao and Dawei Song and Jun Wang and Yuexian Hou and Bin Hu",

year = "2014",

month = nov,

day = "3",

doi = "10.1145/2661829.2661934",

language = "English",

series = "CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management",

publisher = "Association for Computing Machinery",

pages = "1911--1914",

booktitle = "CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management",

}

Zhang, P, Hao, L, Song, D, Wang, J, Hou, Y & Hu, B 2014, Generalized bias-variance evaluation of TREC participated systems. 在 CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management, Association for Computing Machinery, 页码 1911-1914, 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014, Shanghai, 中国, 3/11/14. https://doi.org/10.1145/2661829.2661934

Generalized bias-variance evaluation of TREC participated systems. / Zhang, Peng; Hao, Linxue; Song, Dawei 等.
CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2014. 页码 1911-1914 (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Generalized bias-variance evaluation of TREC participated systems

AU - Zhang, Peng

AU - Hao, Linxue

AU - Song, Dawei

AU - Wang, Jun

AU - Hou, Yuexian

AU - Hu, Bin

PY - 2014/11/3

Y1 - 2014/11/3

N2 - Recent research has shown that the improvement of mean retrieval effectiveness (e.g., MAP) may sacrifice the retrieval stability across queries, implying a tradeoff between effectiveness and stability. The evaluation of both effectiveness and stability are often based on a baseline model, which could be weak or biased. In addition, the effectiveness-stability tradeoff has not been systematically or quantitatively evaluated over TREC participated systems. The above two problems, to some extent, limit our awareness of such tradeoff and its impact on developing future IR models. In this paper, motivated by a recently proposed bias-variance based evaluation, we adopt a strong and unbiased "baseline", which is a virtual target model constructed by the best performance (for each query) among all the participated systems in a retrieval task. We also propose generalized bias-variance metrics, based on which a systematic and quantitative evaluation of the effectiveness-stability tradeoff is carried out over the participated systems in the TREC Ad-hoc Track (1993-1999) and Web Track (2010-2012). We observe a clear effectiveness-stability tradeoff, with a trend of becoming more obvious in more recent years. This implies that when we pursue more effective IR systems over years, the stability has become problematic and could have been largely overlooked.

AB - Recent research has shown that the improvement of mean retrieval effectiveness (e.g., MAP) may sacrifice the retrieval stability across queries, implying a tradeoff between effectiveness and stability. The evaluation of both effectiveness and stability are often based on a baseline model, which could be weak or biased. In addition, the effectiveness-stability tradeoff has not been systematically or quantitatively evaluated over TREC participated systems. The above two problems, to some extent, limit our awareness of such tradeoff and its impact on developing future IR models. In this paper, motivated by a recently proposed bias-variance based evaluation, we adopt a strong and unbiased "baseline", which is a virtual target model constructed by the best performance (for each query) among all the participated systems in a retrieval task. We also propose generalized bias-variance metrics, based on which a systematic and quantitative evaluation of the effectiveness-stability tradeoff is carried out over the participated systems in the TREC Ad-hoc Track (1993-1999) and Web Track (2010-2012). We observe a clear effectiveness-stability tradeoff, with a trend of becoming more obvious in more recent years. This implies that when we pursue more effective IR systems over years, the stability has become problematic and could have been largely overlooked.

KW - Biasvariance tradeoff

KW - Effectiveness-stability tradeoff

KW - Evaluation

KW - Virtual target model

UR - http://www.scopus.com/inward/record.url?scp=84937559411&partnerID=8YFLogxK

U2 - 10.1145/2661829.2661934

DO - 10.1145/2661829.2661934

M3 - Conference contribution

AN - SCOPUS:84937559411

T3 - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

SP - 1911

EP - 1914

BT - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management

PB - Association for Computing Machinery

T2 - 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014

Y2 - 3 November 2014 through 7 November 2014

ER -

Zhang P, Hao L, Song D, Wang J, Hou Y, Hu B. Generalized bias-variance evaluation of TREC participated systems. 在 CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management. Association for Computing Machinery. 2014. 页码 1911-1914. (CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management). doi: 10.1145/2661829.2661934

Generalized bias-variance evaluation of TREC participated systems

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此