An effective approach to verbose queries using a limited dependencies language model

Eduard Hoenkamp; Peter Bruza; Dawei Song; Qiang Huang

doi:10.1007/978-3-642-04417-5_11

An effective approach to verbose queries using a limited dependencies language model

Eduard Hoenkamp^*, Peter Bruza, Dawei Song, Qiang Huang

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

9 引用（Scopus）

摘要

Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

源语言	英语
主期刊名	Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings
页	116-127
页数	12
DOI	https://doi.org/10.1007/978-3-642-04417-5_11
出版状态	已出版 - 2009
已对外发布	是
活动	2nd International Conference on the Theory of Information Retrieval, ICTIR 2009 - Cambridge, 英国期限: 10 9月 2009 → 12 9月 2009

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	5766 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	2nd International Conference on the Theory of Information Retrieval, ICTIR 2009
国家/地区	英国
市	Cambridge
时期	10/09/09 → 12/09/09

访问文件

10.1007/978-3-642-04417-5_11

其它文件与链接

链接到 Scopus 的出版物

引用此

Hoenkamp, E., Bruza, P., Song, D., & Huang, Q. (2009). An effective approach to verbose queries using a limited dependencies language model. 在 Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings (页码 116-127). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 5766 LNCS). https://doi.org/10.1007/978-3-642-04417-5_11

Hoenkamp, Eduard ; Bruza, Peter ; Song, Dawei 等. / An effective approach to verbose queries using a limited dependencies language model. Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings. 2009. 页码 116-127 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{7d6ae1a8c5224c35a83d411e4c160583,

title = "An effective approach to verbose queries using a limited dependencies language model",

abstract = "Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.",

author = "Eduard Hoenkamp and Peter Bruza and Dawei Song and Qiang Huang",

year = "2009",

doi = "10.1007/978-3-642-04417-5_11",

language = "English",

isbn = "3642044166",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "116--127",

booktitle = "Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings",

note = "2nd International Conference on the Theory of Information Retrieval, ICTIR 2009 ; Conference date: 10-09-2009 Through 12-09-2009",

}

Hoenkamp, E, Bruza, P, Song, D & Huang, Q 2009, An effective approach to verbose queries using a limited dependencies language model. 在 Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 5766 LNCS, 页码 116-127, 2nd International Conference on the Theory of Information Retrieval, ICTIR 2009, Cambridge, 英国, 10/09/09. https://doi.org/10.1007/978-3-642-04417-5_11

An effective approach to verbose queries using a limited dependencies language model. / Hoenkamp, Eduard; Bruza, Peter; Song, Dawei 等.
Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings. 2009. 页码 116-127 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 5766 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - An effective approach to verbose queries using a limited dependencies language model

AU - Hoenkamp, Eduard

AU - Bruza, Peter

AU - Song, Dawei

AU - Huang, Qiang

PY - 2009

Y1 - 2009

N2 - Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

AB - Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.

UR - http://www.scopus.com/inward/record.url?scp=70350597567&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-04417-5_11

DO - 10.1007/978-3-642-04417-5_11

M3 - Conference contribution

AN - SCOPUS:70350597567

SN - 3642044166

SN - 9783642044168

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 116

EP - 127

BT - Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings

T2 - 2nd International Conference on the Theory of Information Retrieval, ICTIR 2009

Y2 - 10 September 2009 through 12 September 2009

ER -

Hoenkamp E, Bruza P, Song D, Huang Q. An effective approach to verbose queries using a limited dependencies language model. 在 Advances in Information Retrieval Theory - Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Proceedings. 2009. 页码 116-127. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-04417-5_11

An effective approach to verbose queries using a limited dependencies language model

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此