Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

Chao Wei; Senlin Luo; Limin Pan; Zhouting Wu; Ji Zhang; Qamas Gul Khan Safi

doi:10.1016/j.neucom.2018.01.030

Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

Chao Wei, Senlin Luo, Limin Pan^*, Zhouting Wu, Ji Zhang, Qamas Gul Khan Safi

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.

Original language	English
Pages (from-to)	35-50
Number of pages	16
Journal	Neurocomputing
Volume	285
DOIs	https://doi.org/10.1016/j.neucom.2018.01.030
Publication status	Published - 12 Apr 2018

Keywords

Affine mapping
Markov random walk
Sparse AutoEncoder
Topic model

Access to Document

10.1016/j.neucom.2018.01.030

Cite this

@article{eb170e1c776a47409e46775cfbbe2c8c,

title = "Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization",

abstract = "Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.",

keywords = "Affine mapping, Markov random walk, Sparse AutoEncoder, Topic model",

author = "Chao Wei and Senlin Luo and Limin Pan and Zhouting Wu and Ji Zhang and Safi, {Qamas Gul Khan}",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier Ltd",

year = "2018",

month = apr,

day = "12",

doi = "10.1016/j.neucom.2018.01.030",

language = "English",

volume = "285",

pages = "35--50",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

AU - Wei, Chao

AU - Luo, Senlin

AU - Pan, Limin

AU - Wu, Zhouting

AU - Zhang, Ji

AU - Safi, Qamas Gul Khan

PY - 2018/4/12

Y1 - 2018/4/12

N2 - Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.

AB - Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.

KW - Affine mapping

KW - Markov random walk

KW - Sparse AutoEncoder

KW - Topic model

UR - http://www.scopus.com/inward/record.url?scp=85044858410&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.01.030

DO - 10.1016/j.neucom.2018.01.030

M3 - Article

AN - SCOPUS:85044858410

SN - 0925-2312

VL - 285

SP - 35

EP - 50

JO - Neurocomputing

JF - Neurocomputing

ER -

Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this