News topic detection based on hierarchical clustering and named entity

Sheng Huang, Xueping Peng, Zhendong Niu*, Kunshan Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Citations (Scopus)

Abstract

News topic detection is the process of organizing news story collections and real-time news/broadcast streams into news topics. While unlike the traditional text analysis, it is a process of incremental clustering, and generally divided into retrospective topic detection and online topic detection. This paper considers the feature changes of modern news data experienced from the past, and presents a new topic detection strategy based on hierarchical clustering and named entities. Topic detection process is also divided into retrospective and online steps, and named entities in the news stories are employed in the topic clustering algorithm. For the online step's efficiency and precision, this paper first clusters news stories in each time window into micro-clusters, and then extracts three representation vectors for each micro-cluster to calculate the similarity to existing topics. The experimental results show remarkable improvement compared with recently most applied topic detection method.

Original languageEnglish
Title of host publicationNLP-KE 2011 - Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering
Pages280-284
Number of pages5
DOIs
Publication statusPublished - 2011
Event7th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2011 - Tokushima, Japan
Duration: 27 Nov 201129 Nov 2011

Publication series

NameNLP-KE 2011 - Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering

Conference

Conference7th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2011
Country/TerritoryJapan
CityTokushima
Period27/11/1129/11/11

Keywords

  • agglomerative hierarchical clustering
  • named entity
  • news topic detection
  • vector space model

Fingerprint

Dive into the research topics of 'News topic detection based on hierarchical clustering and named entity'. Together they form a unique fingerprint.

Cite this