TY - JOUR
T1 - Discriminative locally document embedding
T2 - Learning a smooth affine map by approximation of the probabilistic generative structure of subspace
AU - Wei, Chao
AU - Luo, Senlin
AU - Guo, Jia
AU - Wu, Zhouting
AU - Pan, Limin
N1 - Publisher Copyright:
© 2017
PY - 2017/4/1
Y1 - 2017/4/1
N2 - Document embedding is a technology that captures informative representations from high-dimensional observations by some structure-preserving maps over corpus and has been intensively explored in machine learning. Recently, some manifold-inspired embedding methods become a hot topic, mainly due to their ability in capturing discriminative embedding. However, the existing methods capture the embeddings based on the geometrical information of nearest neighbors without considering the intrinsic documents-generating structure on a subspace, thus leads to a limitation to uncover intrinsic semantic information. In this paper, we propose a semi-supervised local-invariant method, called Discriminative Locally Document Embedding (Disc-LDE), aiming to build a smooth affine map for document embedding by preserving documents-generating structure on a subspace. Disc-LDE models the documents-generating structure as a pseudo-document by a generative probabilistic model of subspace, where the subspace is acquired by a transductive learning of multi-agent random walk on neighborhood graph, and regularizes the training of Auto-Encoders (AEs) to jointly recover the input document and its pseudo-document. Under a general regularized function learning framework, the regularized training can impact the parameterized encoder network become smooth to variations along the documents-generating structure of the local field on manifold. The experimental results on three widely-used corpora demonstrate Disc-LDE could efficient capture the intrinsic semantic structure to improve the clustering and classification performance to the state-of-the-arts methods.
AB - Document embedding is a technology that captures informative representations from high-dimensional observations by some structure-preserving maps over corpus and has been intensively explored in machine learning. Recently, some manifold-inspired embedding methods become a hot topic, mainly due to their ability in capturing discriminative embedding. However, the existing methods capture the embeddings based on the geometrical information of nearest neighbors without considering the intrinsic documents-generating structure on a subspace, thus leads to a limitation to uncover intrinsic semantic information. In this paper, we propose a semi-supervised local-invariant method, called Discriminative Locally Document Embedding (Disc-LDE), aiming to build a smooth affine map for document embedding by preserving documents-generating structure on a subspace. Disc-LDE models the documents-generating structure as a pseudo-document by a generative probabilistic model of subspace, where the subspace is acquired by a transductive learning of multi-agent random walk on neighborhood graph, and regularizes the training of Auto-Encoders (AEs) to jointly recover the input document and its pseudo-document. Under a general regularized function learning framework, the regularized training can impact the parameterized encoder network become smooth to variations along the documents-generating structure of the local field on manifold. The experimental results on three widely-used corpora demonstrate Disc-LDE could efficient capture the intrinsic semantic structure to improve the clustering and classification performance to the state-of-the-arts methods.
KW - Document embedding
KW - Generative probabilistic model
KW - Multi-agent random walk
KW - Regularized auto-encoders
KW - Smooth affine map
UR - http://www.scopus.com/inward/record.url?scp=85009827512&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2017.01.012
DO - 10.1016/j.knosys.2017.01.012
M3 - Article
AN - SCOPUS:85009827512
SN - 0950-7051
VL - 121
SP - 41
EP - 57
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
ER -