Improving neural topic modeling via Sinkhorn divergence

Luyang Liu; Heyan Huang; Yang Gao; Yongfeng Zhang

doi:10.1016/j.ipm.2021.102864

Improving neural topic modeling via Sinkhorn divergence

Luyang Liu, Heyan Huang^*, Yang Gao, Yongfeng Zhang

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

9 引用（Scopus）

摘要

Textual data have been a major form to convey internet users’ content. How to effectively and efficiently discover latent topics among them has essential theoretical and practical value. Recently, neural topic models(NTMs), especially Variational Auto-encoder-based NTMs, proved to be a successful approach for mining meaningful and interpretable topics. However, they usually suffer from two major issues:(1)Posterior collapse: KL divergence will rapidly reach zeros resulting in low-quality representation in latent distribution; (2)Unconstrained topic generative models: Topic generative models are always unconstrained, which potentially leads to discovering redundant topics. To address these issues, we propose Autoencoding Sinkhorn Topic Model based on Sinkhorn Auto-encoder(SAE) and Sinkhorn divergence. SAE utilizes Sinkhorn divergence rather than problematic KL divergence to optimize the difference between posterior and prior, which is free of posterior collapse. Then, to reduce topic redundancy, Sinkhorn Topic Diversity Regularization(STDR) is presented. STDR leverages the proposed Salient Topic Layer and Sinkhorn divergence for measuring distance between salient topic features and serves as a penalty term in loss function facilitating discovering diversified topics in training. Several experiments have been conducted on 2 popular datasets to verify our contribution. Experiment results demonstrate the effectiveness of the proposed model.

源语言	英语
文章编号	102864
期刊	Information Processing and Management
卷	59
期	3
DOI	https://doi.org/10.1016/j.ipm.2021.102864
出版状态	已出版 - 5月 2022

访问文件

10.1016/j.ipm.2021.102864

其它文件与链接

链接到 Scopus 的出版物

引用此

Liu, L., Huang, H., Gao, Y., & Zhang, Y. (2022). Improving neural topic modeling via Sinkhorn divergence. Information Processing and Management, 59(3), 文章 102864. https://doi.org/10.1016/j.ipm.2021.102864

@article{a9b73ae2bbea49bca7f9d3a15cc1ce7b,

title = "Improving neural topic modeling via Sinkhorn divergence",

abstract = "Textual data have been a major form to convey internet users{\textquoteright} content. How to effectively and efficiently discover latent topics among them has essential theoretical and practical value. Recently, neural topic models(NTMs), especially Variational Auto-encoder-based NTMs, proved to be a successful approach for mining meaningful and interpretable topics. However, they usually suffer from two major issues:(1)Posterior collapse: KL divergence will rapidly reach zeros resulting in low-quality representation in latent distribution; (2)Unconstrained topic generative models: Topic generative models are always unconstrained, which potentially leads to discovering redundant topics. To address these issues, we propose Autoencoding Sinkhorn Topic Model based on Sinkhorn Auto-encoder(SAE) and Sinkhorn divergence. SAE utilizes Sinkhorn divergence rather than problematic KL divergence to optimize the difference between posterior and prior, which is free of posterior collapse. Then, to reduce topic redundancy, Sinkhorn Topic Diversity Regularization(STDR) is presented. STDR leverages the proposed Salient Topic Layer and Sinkhorn divergence for measuring distance between salient topic features and serves as a penalty term in loss function facilitating discovering diversified topics in training. Several experiments have been conducted on 2 popular datasets to verify our contribution. Experiment results demonstrate the effectiveness of the proposed model.",

keywords = "Auto-encoder, Deep learning, Sinkhorn divergence, Topic model",

author = "Luyang Liu and Heyan Huang and Yang Gao and Yongfeng Zhang",

note = "Publisher Copyright: {\textcopyright} 2021",

year = "2022",

month = may,

doi = "10.1016/j.ipm.2021.102864",

language = "English",

volume = "59",

journal = "Information Processing and Management",

issn = "0306-4573",

publisher = "Elsevier Ltd.",

number = "3",

}

TY - JOUR

T1 - Improving neural topic modeling via Sinkhorn divergence

AU - Liu, Luyang

AU - Huang, Heyan

AU - Gao, Yang

AU - Zhang, Yongfeng

PY - 2022/5

Y1 - 2022/5

N2 - Textual data have been a major form to convey internet users’ content. How to effectively and efficiently discover latent topics among them has essential theoretical and practical value. Recently, neural topic models(NTMs), especially Variational Auto-encoder-based NTMs, proved to be a successful approach for mining meaningful and interpretable topics. However, they usually suffer from two major issues:(1)Posterior collapse: KL divergence will rapidly reach zeros resulting in low-quality representation in latent distribution; (2)Unconstrained topic generative models: Topic generative models are always unconstrained, which potentially leads to discovering redundant topics. To address these issues, we propose Autoencoding Sinkhorn Topic Model based on Sinkhorn Auto-encoder(SAE) and Sinkhorn divergence. SAE utilizes Sinkhorn divergence rather than problematic KL divergence to optimize the difference between posterior and prior, which is free of posterior collapse. Then, to reduce topic redundancy, Sinkhorn Topic Diversity Regularization(STDR) is presented. STDR leverages the proposed Salient Topic Layer and Sinkhorn divergence for measuring distance between salient topic features and serves as a penalty term in loss function facilitating discovering diversified topics in training. Several experiments have been conducted on 2 popular datasets to verify our contribution. Experiment results demonstrate the effectiveness of the proposed model.

AB - Textual data have been a major form to convey internet users’ content. How to effectively and efficiently discover latent topics among them has essential theoretical and practical value. Recently, neural topic models(NTMs), especially Variational Auto-encoder-based NTMs, proved to be a successful approach for mining meaningful and interpretable topics. However, they usually suffer from two major issues:(1)Posterior collapse: KL divergence will rapidly reach zeros resulting in low-quality representation in latent distribution; (2)Unconstrained topic generative models: Topic generative models are always unconstrained, which potentially leads to discovering redundant topics. To address these issues, we propose Autoencoding Sinkhorn Topic Model based on Sinkhorn Auto-encoder(SAE) and Sinkhorn divergence. SAE utilizes Sinkhorn divergence rather than problematic KL divergence to optimize the difference between posterior and prior, which is free of posterior collapse. Then, to reduce topic redundancy, Sinkhorn Topic Diversity Regularization(STDR) is presented. STDR leverages the proposed Salient Topic Layer and Sinkhorn divergence for measuring distance between salient topic features and serves as a penalty term in loss function facilitating discovering diversified topics in training. Several experiments have been conducted on 2 popular datasets to verify our contribution. Experiment results demonstrate the effectiveness of the proposed model.

KW - Auto-encoder

KW - Deep learning

KW - Sinkhorn divergence

KW - Topic model

UR - http://www.scopus.com/inward/record.url?scp=85124602899&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2021.102864

DO - 10.1016/j.ipm.2021.102864

M3 - Article

AN - SCOPUS:85124602899

SN - 0306-4573

VL - 59

JO - Information Processing and Management

JF - Information Processing and Management

IS - 3

M1 - 102864

ER -

Improving neural topic modeling via Sinkhorn divergence

摘要

访问文件

其它文件与链接

指纹

引用此