Mining pure high-order word associations via information geometry for information retrieval

Yuexian Hou, Xiaozhao Zhao, Dawei Song*, Wenjie Li

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

20 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 20
  • Captures
    • Readers: 20
see details

摘要

The classical bag-of-word models for information retrieval (IR) fail to capture contextual associations between words. In this article, we propose to investigate pure high-order dependence among a number of words forming an unseparable semantic entity, that is, the high-order dependence that cannot be reduced to the random coincidence of lower-order dependencies. We believe that identifying these pure high-order dependence patterns would lead to a better representation of documents and novel retrieval models. Specifically, two formal definitions of pure dependence-unconditional pure dependence (UPD) and conditional pure dependence (CPD)-are defined. The exact decision on UPD and CPD, however, is NP-hard in general.We hence derive and prove the sufficient criteria that entail UPD and CPD, within the well-principled information geometry (IG) framework, leading to a more feasible UPD/CPD identification procedure. We further develop novel methods for extracting word patterns with pure high-order dependence. Our methods are applied to and extensively evaluated on three typical IR tasks: text classification and text retrieval without and with query expansion.

源语言英语
期刊ACM Transactions on Information Systems
31
3
DOI
出版状态已出版 - 7月 2013
已对外发布

指纹

探究 'Mining pure high-order word associations via information geometry for information retrieval' 的科研主题。它们共同构成独一无二的指纹。

引用此

Hou, Y., Zhao, X., Song, D., & Li, W. (2013). Mining pure high-order word associations via information geometry for information retrieval. ACM Transactions on Information Systems, 31(3). https://doi.org/10.1145/2493175.2493177