Locally embedding autoencoders: A semi-supervised manifold learning approach of document representation

Chao Wei, Senlin Luo, Xincheng Ma, Hao Ren, Ji Zhang, Limin Pan

科研成果: 期刊稿件文章同行评审

22 引用 (Scopus)

摘要

Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

源语言英语
文章编号e0146672
期刊PLoS ONE
11
1
DOI
出版状态已出版 - 1 1月 2016

指纹

探究 'Locally embedding autoencoders: A semi-supervised manifold learning approach of document representation' 的科研主题。它们共同构成独一无二的指纹。

引用此