跳到主要导航 跳到搜索 跳到主要内容

Approximating true relevance distribution from a mixture model based on irrelevance data

  • Peng Zhang*
  • , Yuexian Hou
  • , Dawei Song
  • *此作品的通讯作者
  • Robert Gordon University
  • Tianjin University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Pseudo relevance feedback (PRF), which has been widely applied in IR, aims to derive a distribution from the top n pseudo relevant documents D. However, these documents are often a mixture of relevant and irrelevant documents. As a result, the derived distribution is actually a mixture model, which has long been limiting the performance of PRF. This is particularly the case when we deal with difficult queries where the truly relevant documents in D are very sparse. In this situation, it is often easier to identify a small number of seed irrelevant documents, which can form a seed irrelevant distribution. Then, a fundamental and challenging problem arises: solely based on the mixed distribution and a seed irrelevance distribution, how to automatically generate an optimal approximation of the true relevance distribution? In this paper, we propose a novel distribution separation model (DSM) to tackle this problem. Theoretical justifications of the proposed algorithm are given. Evaluation results from our extensive simulated experiments on several large scale TREC data sets demonstrate the effectiveness of our method, which outperforms a well respected PRF Model, the Relevance Model (RM), as well as the use of RM on D with the seed negative documents directly removed.

源语言英语
主期刊名Proceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
107-114
页数8
DOI
出版状态已出版 - 2009
已对外发布
活动32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009 - Boston, MA, 美国
期限: 19 7月 200923 7月 2009

出版系列

姓名Proceedings - 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009

会议

会议32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009
国家/地区美国
Boston, MA
时期19/07/0923/07/09

指纹

探究 'Approximating true relevance distribution from a mixture model based on irrelevance data' 的科研主题。它们共同构成独一无二的指纹。

引用此