TY - GEN
T1 - A study of per-topic variance on system comparison
AU - Yang, Meng
AU - Zhang, Peng
AU - Song, Dawei
N1 - Publisher Copyright:
© 2018 ACM.
PY - 2018/6/27
Y1 - 2018/6/27
N2 - Under the notion that the document collection is a sample from a population, the observed per-topic metric (e.g., AP) value varies with different samples, leading to the per-topic variance. The results of the system comparison, such as comparing the ranking of systems according to the summary metric (e.g., MAP) or testing whether there is significant difference between two systems, are affected by the variability of per-topic metric values. In this paper, we study the effect of per-topic variance on the system comparison. To measure such effects, we employ two ranking-based methods, i.e., Error Rate (ER) and Kendall Rank Correlation Coefficient (KRCC), as well as two significance test based methods, namely Achieved Significance Level (ASL) and Estimated Difference (ED). We conduct empirical comparison of TREC participated systems on Robust and Adhoc track, which shows that the effect of per-topic variance on the ranking of systems is not obvious, while the significance test based comparisons are susceptible to the per-topic variance.
AB - Under the notion that the document collection is a sample from a population, the observed per-topic metric (e.g., AP) value varies with different samples, leading to the per-topic variance. The results of the system comparison, such as comparing the ranking of systems according to the summary metric (e.g., MAP) or testing whether there is significant difference between two systems, are affected by the variability of per-topic metric values. In this paper, we study the effect of per-topic variance on the system comparison. To measure such effects, we employ two ranking-based methods, i.e., Error Rate (ER) and Kendall Rank Correlation Coefficient (KRCC), as well as two significance test based methods, namely Achieved Significance Level (ASL) and Estimated Difference (ED). We conduct empirical comparison of TREC participated systems on Robust and Adhoc track, which shows that the effect of per-topic variance on the ranking of systems is not obvious, while the significance test based comparisons are susceptible to the per-topic variance.
KW - Evaluation
KW - Per-topic variance
KW - System comparison
UR - http://www.scopus.com/inward/record.url?scp=85051469174&partnerID=8YFLogxK
U2 - 10.1145/3209978.3210122
DO - 10.1145/3209978.3210122
M3 - Conference contribution
AN - SCOPUS:85051469174
T3 - 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
SP - 1181
EP - 1184
BT - 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
PB - Association for Computing Machinery, Inc
T2 - 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
Y2 - 8 July 2018 through 12 July 2018
ER -