Abstract
Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks.
Original language | English |
---|---|
Pages (from-to) | 35-50 |
Number of pages | 16 |
Journal | Neurocomputing |
Volume | 285 |
DOIs | |
Publication status | Published - 12 Apr 2018 |
Keywords
- Affine mapping
- Markov random walk
- Sparse AutoEncoder
- Topic model