Locally embedding autoencoders: A semi-supervised manifold learning approach of document representation

Chao Wei; Senlin Luo; Xincheng Ma; Hao Ren; Ji Zhang; Limin Pan

doi:10.1371/journal.pone.0146672

Locally embedding autoencoders: A semi-supervised manifold learning approach of document representation

Chao Wei, Senlin Luo, Xincheng Ma, Hao Ren, Ji Zhang, Limin Pan

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

22 引用（Scopus）

摘要

Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

源语言	英语
文章编号	e0146672
期刊	PLoS ONE
卷	11
期	1
DOI	https://doi.org/10.1371/journal.pone.0146672
出版状态	已出版 - 1 1月 2016

访问文件

10.1371/journal.pone.0146672

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{de2a9ccb3fb64ecf8102d64dfe343d99,

title = "Locally embedding autoencoders: A semi-supervised manifold learning approach of document representation",

abstract = "Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.",

author = "Chao Wei and Senlin Luo and Xincheng Ma and Hao Ren and Ji Zhang and Limin Pan",

note = "Publisher Copyright: {\textcopyright} 2016 Wei et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",

year = "2016",

month = jan,

day = "1",

doi = "10.1371/journal.pone.0146672",

language = "English",

volume = "11",

journal = "PLoS ONE",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "1",

}

TY - JOUR

T1 - Locally embedding autoencoders

T2 - A semi-supervised manifold learning approach of document representation

AU - Wei, Chao

AU - Luo, Senlin

AU - Ma, Xincheng

AU - Ren, Hao

AU - Zhang, Ji

AU - Pan, Limin

N1 - Publisher Copyright: © 2016 Wei et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

AB - Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

UR - http://www.scopus.com/inward/record.url?scp=84958191631&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0146672

DO - 10.1371/journal.pone.0146672

M3 - Article

C2 - 26784692

AN - SCOPUS:84958191631

SN - 1932-6203

VL - 11

JO - PLoS ONE

JF - PLoS ONE

IS - 1

M1 - e0146672

ER -

Locally embedding autoencoders: A semi-supervised manifold learning approach of document representation

摘要

访问文件

其它文件与链接

指纹

引用此