Improving neural topic modeling via Sinkhorn divergence

Luyang Liu, Heyan Huang*, Yang Gao, Yongfeng Zhang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

9 引用 (Scopus)

摘要

Textual data have been a major form to convey internet users’ content. How to effectively and efficiently discover latent topics among them has essential theoretical and practical value. Recently, neural topic models(NTMs), especially Variational Auto-encoder-based NTMs, proved to be a successful approach for mining meaningful and interpretable topics. However, they usually suffer from two major issues:(1)Posterior collapse: KL divergence will rapidly reach zeros resulting in low-quality representation in latent distribution; (2)Unconstrained topic generative models: Topic generative models are always unconstrained, which potentially leads to discovering redundant topics. To address these issues, we propose Autoencoding Sinkhorn Topic Model based on Sinkhorn Auto-encoder(SAE) and Sinkhorn divergence. SAE utilizes Sinkhorn divergence rather than problematic KL divergence to optimize the difference between posterior and prior, which is free of posterior collapse. Then, to reduce topic redundancy, Sinkhorn Topic Diversity Regularization(STDR) is presented. STDR leverages the proposed Salient Topic Layer and Sinkhorn divergence for measuring distance between salient topic features and serves as a penalty term in loss function facilitating discovering diversified topics in training. Several experiments have been conducted on 2 popular datasets to verify our contribution. Experiment results demonstrate the effectiveness of the proposed model.

源语言英语
文章编号102864
期刊Information Processing and Management
59
3
DOI
出版状态已出版 - 5月 2022

指纹

探究 'Improving neural topic modeling via Sinkhorn divergence' 的科研主题。它们共同构成独一无二的指纹。

引用此

Liu, L., Huang, H., Gao, Y., & Zhang, Y. (2022). Improving neural topic modeling via Sinkhorn divergence. Information Processing and Management, 59(3), 文章 102864. https://doi.org/10.1016/j.ipm.2021.102864