Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

Chao Wei; Senlin Luo; Limin Pan; Zhouting Wu; Ji Zhang; Qamas Gul Khan Safi

doi:10.1016/j.neucom.2018.01.030

Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

Chao Wei, Senlin Luo, Limin Pan^*, Zhouting Wu, Ji Zhang, Qamas Gul Khan Safi

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.

源语言	英语
页（从-至）	35-50
页数	16
期刊	Neurocomputing
卷	285
DOI	https://doi.org/10.1016/j.neucom.2018.01.030
出版状态	已出版 - 12 4月 2018

访问文件

10.1016/j.neucom.2018.01.030

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{eb170e1c776a47409e46775cfbbe2c8c,

title = "Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization",

abstract = "Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.",

keywords = "Affine mapping, Markov random walk, Sparse AutoEncoder, Topic model",

author = "Chao Wei and Senlin Luo and Limin Pan and Zhouting Wu and Ji Zhang and Safi, {Qamas Gul Khan}",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier Ltd",

year = "2018",

month = apr,

day = "12",

doi = "10.1016/j.neucom.2018.01.030",

language = "English",

volume = "285",

pages = "35--50",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

AU - Wei, Chao

AU - Luo, Senlin

AU - Pan, Limin

AU - Wu, Zhouting

AU - Zhang, Ji

AU - Safi, Qamas Gul Khan

PY - 2018/4/12

Y1 - 2018/4/12

N2 - Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.

AB - Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.

KW - Affine mapping

KW - Markov random walk

KW - Sparse AutoEncoder

KW - Topic model

UR - http://www.scopus.com/inward/record.url?scp=85044858410&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.01.030

DO - 10.1016/j.neucom.2018.01.030

M3 - Article

AN - SCOPUS:85044858410

SN - 0925-2312

VL - 285

SP - 35

EP - 50

JO - Neurocomputing

JF - Neurocomputing

ER -

Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

摘要

访问文件

其它文件与链接

指纹

引用此