Modeling positive and negative feedback for improving document retrieval

Shufeng Hao; Chongyang Shi; Zhendong Niu; Longbing Cao

doi:10.1016/j.eswa.2018.11.035

Modeling positive and negative feedback for improving document retrieval

Shufeng Hao, Chongyang Shi^*, Zhendong Niu, Longbing Cao

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

Pseudo-relevance feedback (PRF) has evident potential for enriching the representation of short queries. Traditional PRF methods treat top-ranked documents as feedback, since they are assumed to be relevant to the query. However, some of these feedback documents may actually distract from the query topic for a range of reasons and accordingly downgrade PRF system performance. Such documents constitute negative examples (negative feedback) but could also be valuable in retrieval. In this paper, a novel framework of query language model construction is proposed in order to improve retrieval performance by integrating both positive and negative feedback. First, an improvement-based method is proposed to automatically identify the types of feedback documents (i.e. positive or negative) according to whether the document enhances the retrieval's effectiveness. Subsequently, based on the learned positive and negative examples, the positive feedback models and the negative feedback models are estimated using an Expectation-Maximization algorithm with the assumptions: the positive term distribution is affected by the context term distribution and the negative term distribution is affected by both the positive term distribution and the context term distribution (such that the positive feedback model upgrades the rankings of relevant documents and the negative feedback model prunes the irrelevant documents from a query). Finally, a content-based representativeness criterion is proposed in order to obtain the representative negative feedback documents. Experiments conducted on the TREC collections demonstrate that our proposed approach results in better retrieval accuracy and robustness than baseline methods.

源语言	英语
页（从-至）	253-261
页数	9
期刊	Expert Systems with Applications
卷	120
DOI	https://doi.org/10.1016/j.eswa.2018.11.035
出版状态	已出版 - 15 4月 2019

访问文件

10.1016/j.eswa.2018.11.035

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a2c70b1429a147038b61744b978ef8d1,

title = "Modeling positive and negative feedback for improving document retrieval",

abstract = "Pseudo-relevance feedback (PRF) has evident potential for enriching the representation of short queries. Traditional PRF methods treat top-ranked documents as feedback, since they are assumed to be relevant to the query. However, some of these feedback documents may actually distract from the query topic for a range of reasons and accordingly downgrade PRF system performance. Such documents constitute negative examples (negative feedback) but could also be valuable in retrieval. In this paper, a novel framework of query language model construction is proposed in order to improve retrieval performance by integrating both positive and negative feedback. First, an improvement-based method is proposed to automatically identify the types of feedback documents (i.e. positive or negative) according to whether the document enhances the retrieval's effectiveness. Subsequently, based on the learned positive and negative examples, the positive feedback models and the negative feedback models are estimated using an Expectation-Maximization algorithm with the assumptions: the positive term distribution is affected by the context term distribution and the negative term distribution is affected by both the positive term distribution and the context term distribution (such that the positive feedback model upgrades the rankings of relevant documents and the negative feedback model prunes the irrelevant documents from a query). Finally, a content-based representativeness criterion is proposed in order to obtain the representative negative feedback documents. Experiments conducted on the TREC collections demonstrate that our proposed approach results in better retrieval accuracy and robustness than baseline methods.",

keywords = "Language model, Negative feedback, Positive feedback, Pseudo-relevance feedback",

author = "Shufeng Hao and Chongyang Shi and Zhendong Niu and Longbing Cao",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier Ltd",

year = "2019",

month = apr,

day = "15",

doi = "10.1016/j.eswa.2018.11.035",

language = "English",

volume = "120",

pages = "253--261",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Modeling positive and negative feedback for improving document retrieval

AU - Hao, Shufeng

AU - Shi, Chongyang

AU - Niu, Zhendong

AU - Cao, Longbing

PY - 2019/4/15

Y1 - 2019/4/15

N2 - Pseudo-relevance feedback (PRF) has evident potential for enriching the representation of short queries. Traditional PRF methods treat top-ranked documents as feedback, since they are assumed to be relevant to the query. However, some of these feedback documents may actually distract from the query topic for a range of reasons and accordingly downgrade PRF system performance. Such documents constitute negative examples (negative feedback) but could also be valuable in retrieval. In this paper, a novel framework of query language model construction is proposed in order to improve retrieval performance by integrating both positive and negative feedback. First, an improvement-based method is proposed to automatically identify the types of feedback documents (i.e. positive or negative) according to whether the document enhances the retrieval's effectiveness. Subsequently, based on the learned positive and negative examples, the positive feedback models and the negative feedback models are estimated using an Expectation-Maximization algorithm with the assumptions: the positive term distribution is affected by the context term distribution and the negative term distribution is affected by both the positive term distribution and the context term distribution (such that the positive feedback model upgrades the rankings of relevant documents and the negative feedback model prunes the irrelevant documents from a query). Finally, a content-based representativeness criterion is proposed in order to obtain the representative negative feedback documents. Experiments conducted on the TREC collections demonstrate that our proposed approach results in better retrieval accuracy and robustness than baseline methods.

AB - Pseudo-relevance feedback (PRF) has evident potential for enriching the representation of short queries. Traditional PRF methods treat top-ranked documents as feedback, since they are assumed to be relevant to the query. However, some of these feedback documents may actually distract from the query topic for a range of reasons and accordingly downgrade PRF system performance. Such documents constitute negative examples (negative feedback) but could also be valuable in retrieval. In this paper, a novel framework of query language model construction is proposed in order to improve retrieval performance by integrating both positive and negative feedback. First, an improvement-based method is proposed to automatically identify the types of feedback documents (i.e. positive or negative) according to whether the document enhances the retrieval's effectiveness. Subsequently, based on the learned positive and negative examples, the positive feedback models and the negative feedback models are estimated using an Expectation-Maximization algorithm with the assumptions: the positive term distribution is affected by the context term distribution and the negative term distribution is affected by both the positive term distribution and the context term distribution (such that the positive feedback model upgrades the rankings of relevant documents and the negative feedback model prunes the irrelevant documents from a query). Finally, a content-based representativeness criterion is proposed in order to obtain the representative negative feedback documents. Experiments conducted on the TREC collections demonstrate that our proposed approach results in better retrieval accuracy and robustness than baseline methods.

KW - Language model

KW - Negative feedback

KW - Positive feedback

KW - Pseudo-relevance feedback

UR - http://www.scopus.com/inward/record.url?scp=85057467375&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2018.11.035

DO - 10.1016/j.eswa.2018.11.035

M3 - Article

AN - SCOPUS:85057467375

SN - 0957-4174

VL - 120

SP - 253

EP - 261

JO - Expert Systems with Applications

JF - Expert Systems with Applications

ER -

Modeling positive and negative feedback for improving document retrieval

摘要

访问文件

其它文件与链接

指纹

引用此