TY - GEN
T1 - A novel fast framework for topic labeling based on similarity-preserved hashing
AU - Mao, Xian Ling
AU - Hao, Yi Jing
AU - Zhou, Qiang
AU - Yuan, Wen Qing
AU - Yang, Liner
AU - Huang, Heyan
N1 - Publisher Copyright:
© 1963-2018 ACL.
PY - 2016
Y1 - 2016
N2 - Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; Meanwhile, it's hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in probability distributions. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.
AB - Recently, topic modeling has been widely applied in data mining due to its powerful ability. A common, major challenge in applying such topic models to other tasks is to accurately interpret the meaning of each topic. Topic labeling, as a major interpreting method, has attracted significant attention recently. However, most of previous works only focus on the effectiveness of topic labeling, and less attention has been paid to quickly creating good topic descriptors; Meanwhile, it's hard to assign labels for new emerging topics by using most of existing methods. To solve the problems above, in this paper, we propose a novel fast topic labeling framework that casts the labeling problem as a k-nearest neighbor (KNN) search problem in probability distributions. Our experimental results show that the proposed sequential interleaving method based on locality sensitive hashing (LSH) technology is efficient in boosting the comparison speed among probability distributions, and the proposed framework can generate meaningful labels to interpret topics, including new emerging topics.
UR - https://www.scopus.com/pages/publications/85054984796
M3 - Conference contribution
AN - SCOPUS:85054984796
SN - 9784879747020
T3 - COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers
SP - 3339
EP - 3348
BT - COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
PB - Association for Computational Linguistics, ACL Anthology
T2 - 26th International Conference on Computational Linguistics, COLING 2016
Y2 - 11 December 2016 through 16 December 2016
ER -