TY - GEN
T1 - A hot topic detection method for Chinese Microblog based on topic words
AU - Zheng, Jun
AU - Li, Yuanjun
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/5/11
Y1 - 2014/5/11
N2 - Microblog is a kind of new network medium which sprang up quickly. Detection and tracking of hot topics through Microblog has attracted wide attentions from scholars at home and abroad in recent years. The algorithm which aims at finding topics in long text messages such as in traditional news websites and blogs, etc. can't effectively be used in disposing the Microblog data with a property of sparseness. This paper contributes a method, which aims to identify hot topics in Microblog based on the topic words. This method, throughpre-treating the Microblog data and dividing the time-window, extracts topic words in the Microblog data according to the two factors of increasing rate of word frequency and relative word frequency from Microblog data in every time-window. And then extracts and clusters the topic words according to the similarity among them, sieving for a suitable cluster of topic words so as to describe the hot topic and realize the detection of hot topic in Microblog. Through experimental verification, this method can improve the efficiency of detection to a certain extent, and raise the recall ratio and the precision ratio, so as to find hot topic in Microblog effectively and timely.
AB - Microblog is a kind of new network medium which sprang up quickly. Detection and tracking of hot topics through Microblog has attracted wide attentions from scholars at home and abroad in recent years. The algorithm which aims at finding topics in long text messages such as in traditional news websites and blogs, etc. can't effectively be used in disposing the Microblog data with a property of sparseness. This paper contributes a method, which aims to identify hot topics in Microblog based on the topic words. This method, throughpre-treating the Microblog data and dividing the time-window, extracts topic words in the Microblog data according to the two factors of increasing rate of word frequency and relative word frequency from Microblog data in every time-window. And then extracts and clusters the topic words according to the similarity among them, sieving for a suitable cluster of topic words so as to describe the hot topic and realize the detection of hot topic in Microblog. Through experimental verification, this method can improve the efficiency of detection to a certain extent, and raise the recall ratio and the precision ratio, so as to find hot topic in Microblog effectively and timely.
KW - Microblog
KW - TDT
KW - clustering algorithm
KW - hot topic
KW - topic
KW - word
UR - http://www.scopus.com/inward/record.url?scp=84983142828&partnerID=8YFLogxK
U2 - 10.1109/ICITEC.2014.7105615
DO - 10.1109/ICITEC.2014.7105615
M3 - Conference contribution
AN - SCOPUS:84983142828
T3 - Proceedings of 2nd International Conference on Information Technology and Electronic Commerce, ICITEC 2014
SP - 262
EP - 266
BT - Proceedings of 2nd International Conference on Information Technology and Electronic Commerce, ICITEC 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Information Technology and Electronic Commerce, ICITEC 2014
Y2 - 20 December 2014 through 21 December 2014
ER -