TY - GEN
T1 - News topic detection based on hierarchical clustering and named entity
AU - Huang, Sheng
AU - Peng, Xueping
AU - Niu, Zhendong
AU - Wang, Kunshan
PY - 2011
Y1 - 2011
N2 - News topic detection is the process of organizing news story collections and real-time news/broadcast streams into news topics. While unlike the traditional text analysis, it is a process of incremental clustering, and generally divided into retrospective topic detection and online topic detection. This paper considers the feature changes of modern news data experienced from the past, and presents a new topic detection strategy based on hierarchical clustering and named entities. Topic detection process is also divided into retrospective and online steps, and named entities in the news stories are employed in the topic clustering algorithm. For the online step's efficiency and precision, this paper first clusters news stories in each time window into micro-clusters, and then extracts three representation vectors for each micro-cluster to calculate the similarity to existing topics. The experimental results show remarkable improvement compared with recently most applied topic detection method.
AB - News topic detection is the process of organizing news story collections and real-time news/broadcast streams into news topics. While unlike the traditional text analysis, it is a process of incremental clustering, and generally divided into retrospective topic detection and online topic detection. This paper considers the feature changes of modern news data experienced from the past, and presents a new topic detection strategy based on hierarchical clustering and named entities. Topic detection process is also divided into retrospective and online steps, and named entities in the news stories are employed in the topic clustering algorithm. For the online step's efficiency and precision, this paper first clusters news stories in each time window into micro-clusters, and then extracts three representation vectors for each micro-cluster to calculate the similarity to existing topics. The experimental results show remarkable improvement compared with recently most applied topic detection method.
KW - agglomerative hierarchical clustering
KW - named entity
KW - news topic detection
KW - vector space model
UR - http://www.scopus.com/inward/record.url?scp=84863119926&partnerID=8YFLogxK
U2 - 10.1109/NLPKE.2011.6138209
DO - 10.1109/NLPKE.2011.6138209
M3 - Conference contribution
AN - SCOPUS:84863119926
SN - 9781612847283
T3 - NLP-KE 2011 - Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering
SP - 280
EP - 284
BT - NLP-KE 2011 - Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering
T2 - 7th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2011
Y2 - 27 November 2011 through 29 November 2011
ER -