Generalized analysis of a distribution separation method

Peng Zhang; Qian Yu; Yuexian Hou; Dawei Song; Jingfei Li; Bin Hu

doi:10.3390/e18040105

Generalized analysis of a distribution separation method

Peng Zhang, Qian Yu, Yuexian Hou^*, Dawei Song, Jingfei Li, Bin Hu

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Separating two probability distributions from a mixture model that is made up of the combinations of the two is essential to a wide range of applications. For example, in information retrieval (IR), there often exists a mixture distribution consisting of a relevance distribution that we need to estimate and an irrelevance distribution that we hope to get rid of. Recently, a distribution separation method (DSM) was proposed to approximate the relevance distribution, by separating a seed irrelevance distribution from the mixture distribution. It was successfully applied to an IR task, namely pseudo-relevance feedback (PRF), where the query expansion model is often a mixture term distribution. Although initially developed in the context of IR, DSM is indeed a general mathematical formulation for probability distribution separation. Thus, it is important to further generalize its basic analysis and to explore its connections to other related methods. In this article, we first extend DSM's theoretical analysis, which was originally based on the Pearson correlation coefficient, to entropy-related measures, including the KL-divergence (Kullback-Leibler divergence), the symmetrized KL-divergence and the JS-divergence (Jensen-Shannon divergence). Second, we investigate the distribution separation idea in a well-known method, namely the mixture model feedback (MMF) approach. We prove that MMF also complies with the linear combination assumption, and then, DSM's linear separation algorithm can largely simplify the EM algorithm in MMF. These theoretical analyses, as well as further empirical evaluation results demonstrate the advantages of our DSM approach.

源语言	英语
文章编号	105
期刊	Entropy
卷	18
期	4
DOI	https://doi.org/10.3390/e18040105
出版状态	已出版 - 1 4月 2016
已对外发布	是

访问文件

10.3390/e18040105

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, P., Yu, Q., Hou, Y., Song, D., Li, J., & Hu, B. (2016). Generalized analysis of a distribution separation method. Entropy, 18(4), 文章 105. https://doi.org/10.3390/e18040105

@article{ac944a1335964959b7fb60d068ae871f,

title = "Generalized analysis of a distribution separation method",

abstract = "Separating two probability distributions from a mixture model that is made up of the combinations of the two is essential to a wide range of applications. For example, in information retrieval (IR), there often exists a mixture distribution consisting of a relevance distribution that we need to estimate and an irrelevance distribution that we hope to get rid of. Recently, a distribution separation method (DSM) was proposed to approximate the relevance distribution, by separating a seed irrelevance distribution from the mixture distribution. It was successfully applied to an IR task, namely pseudo-relevance feedback (PRF), where the query expansion model is often a mixture term distribution. Although initially developed in the context of IR, DSM is indeed a general mathematical formulation for probability distribution separation. Thus, it is important to further generalize its basic analysis and to explore its connections to other related methods. In this article, we first extend DSM's theoretical analysis, which was originally based on the Pearson correlation coefficient, to entropy-related measures, including the KL-divergence (Kullback-Leibler divergence), the symmetrized KL-divergence and the JS-divergence (Jensen-Shannon divergence). Second, we investigate the distribution separation idea in a well-known method, namely the mixture model feedback (MMF) approach. We prove that MMF also complies with the linear combination assumption, and then, DSM's linear separation algorithm can largely simplify the EM algorithm in MMF. These theoretical analyses, as well as further empirical evaluation results demonstrate the advantages of our DSM approach.",

keywords = "Distribution separation, Information retrieval, KL-divergence, Mixture model",

author = "Peng Zhang and Qian Yu and Yuexian Hou and Dawei Song and Jingfei Li and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2016 by the authors.",

year = "2016",

month = apr,

day = "1",

doi = "10.3390/e18040105",

language = "English",

volume = "18",

journal = "Entropy",

issn = "1099-4300",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "4",

}

TY - JOUR

T1 - Generalized analysis of a distribution separation method

AU - Zhang, Peng

AU - Yu, Qian

AU - Hou, Yuexian

AU - Song, Dawei

AU - Li, Jingfei

AU - Hu, Bin

PY - 2016/4/1

Y1 - 2016/4/1

N2 - Separating two probability distributions from a mixture model that is made up of the combinations of the two is essential to a wide range of applications. For example, in information retrieval (IR), there often exists a mixture distribution consisting of a relevance distribution that we need to estimate and an irrelevance distribution that we hope to get rid of. Recently, a distribution separation method (DSM) was proposed to approximate the relevance distribution, by separating a seed irrelevance distribution from the mixture distribution. It was successfully applied to an IR task, namely pseudo-relevance feedback (PRF), where the query expansion model is often a mixture term distribution. Although initially developed in the context of IR, DSM is indeed a general mathematical formulation for probability distribution separation. Thus, it is important to further generalize its basic analysis and to explore its connections to other related methods. In this article, we first extend DSM's theoretical analysis, which was originally based on the Pearson correlation coefficient, to entropy-related measures, including the KL-divergence (Kullback-Leibler divergence), the symmetrized KL-divergence and the JS-divergence (Jensen-Shannon divergence). Second, we investigate the distribution separation idea in a well-known method, namely the mixture model feedback (MMF) approach. We prove that MMF also complies with the linear combination assumption, and then, DSM's linear separation algorithm can largely simplify the EM algorithm in MMF. These theoretical analyses, as well as further empirical evaluation results demonstrate the advantages of our DSM approach.

AB - Separating two probability distributions from a mixture model that is made up of the combinations of the two is essential to a wide range of applications. For example, in information retrieval (IR), there often exists a mixture distribution consisting of a relevance distribution that we need to estimate and an irrelevance distribution that we hope to get rid of. Recently, a distribution separation method (DSM) was proposed to approximate the relevance distribution, by separating a seed irrelevance distribution from the mixture distribution. It was successfully applied to an IR task, namely pseudo-relevance feedback (PRF), where the query expansion model is often a mixture term distribution. Although initially developed in the context of IR, DSM is indeed a general mathematical formulation for probability distribution separation. Thus, it is important to further generalize its basic analysis and to explore its connections to other related methods. In this article, we first extend DSM's theoretical analysis, which was originally based on the Pearson correlation coefficient, to entropy-related measures, including the KL-divergence (Kullback-Leibler divergence), the symmetrized KL-divergence and the JS-divergence (Jensen-Shannon divergence). Second, we investigate the distribution separation idea in a well-known method, namely the mixture model feedback (MMF) approach. We prove that MMF also complies with the linear combination assumption, and then, DSM's linear separation algorithm can largely simplify the EM algorithm in MMF. These theoretical analyses, as well as further empirical evaluation results demonstrate the advantages of our DSM approach.

KW - Distribution separation

KW - Information retrieval

KW - KL-divergence

KW - Mixture model

UR - http://www.scopus.com/inward/record.url?scp=84964523881&partnerID=8YFLogxK

U2 - 10.3390/e18040105

DO - 10.3390/e18040105

M3 - Article

AN - SCOPUS:84964523881

SN - 1099-4300

VL - 18

JO - Entropy

JF - Entropy

IS - 4

M1 - 105

ER -

Generalized analysis of a distribution separation method

摘要

访问文件

其它文件与链接

指纹

引用此