TY - JOUR
T1 - Self-adaptive topic model
T2 - A solution to the problem of rich topics get richer
AU - Fang, Ying
AU - Huang, Heyan
AU - Jian, Ping
AU - Xin, Xin
AU - Feng, Chong
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2014/12/1
Y1 - 2014/12/1
N2 - The problem of rich topics get richer (RTGR) is popular to the topic models, which will bring the wrong topic distribution if the distributing process has not been intervened. In standard LDA (Latent Dirichlet Allocation) model, each word in all the documents has the same statistical ability. In fact, the words have different impact towards different topics. Under the guidance of this thought, we extend ILDA (Infinite LDA) by considering the bias role of words to divide the topics. We propose a self-adaptive topic model to overcome the RTGR problem specifically. The model proposed in this paper is adapted to three questions: (1) the topic number is changeable with the collection of the documents, which is suitable for the dynamic data; (2) the words have discriminating attributes to topic distribution; (3) a self-adaptive method is used to realize the automatic re-sampling. To verify our model, we design a topic evolution analysis system which can realize the following functions: the topic classification in each cycle, the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order. The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand, the result was feasible.
AB - The problem of rich topics get richer (RTGR) is popular to the topic models, which will bring the wrong topic distribution if the distributing process has not been intervened. In standard LDA (Latent Dirichlet Allocation) model, each word in all the documents has the same statistical ability. In fact, the words have different impact towards different topics. Under the guidance of this thought, we extend ILDA (Infinite LDA) by considering the bias role of words to divide the topics. We propose a self-adaptive topic model to overcome the RTGR problem specifically. The model proposed in this paper is adapted to three questions: (1) the topic number is changeable with the collection of the documents, which is suitable for the dynamic data; (2) the words have discriminating attributes to topic distribution; (3) a self-adaptive method is used to realize the automatic re-sampling. To verify our model, we design a topic evolution analysis system which can realize the following functions: the topic classification in each cycle, the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order. The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand, the result was feasible.
KW - Dirichlet process
KW - infinite Latent Dirichlet Allocation
KW - topic evolution
KW - topic model
UR - http://www.scopus.com/inward/record.url?scp=84921877629&partnerID=8YFLogxK
U2 - 10.1109/CC.2014.7019838
DO - 10.1109/CC.2014.7019838
M3 - Article
AN - SCOPUS:84921877629
SN - 1673-5447
VL - 11
SP - 35
EP - 43
JO - China Communications
JF - China Communications
IS - 12
ER -