Computation on sentence semantic distance for novelty detection

Hua Ping Zhang*, Jian Sun, Bing Wang, Shuo Bai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Novelty detection is to retrieve new information and filter redundancy from given sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach to novelty detection with semantic distance computation. The motivation is to expand a sentence by introducing semantic information. Computation on semantic distance between sentences incorporates WordNet with statistical information. The novelty detection is treated as a binary classification problem: new sentence or not. The feature vector, used in the vector space model for classification, consists of various factors, including the semantic distance from the sentence to the topic and the distance from the sentence to the previous relevant context occurring before it. New sentences are then detected with Winnow and support vector machine classifiers, respectively. Several experiments are conducted to survey the relationship between different factors and performance. It is proved that semantic computation is promising in novelty detection. The ratio of new sentence size to relevant size is further studied given different relevant document sizes. It is found that the ratio reduced with a certain speed (about 0.86). Then another group of experiments is performed supervised with the ratio. It is demonstrated that the ratio is helpful to improve the novelty detection performance.

Original languageEnglish
Pages (from-to)331-337
Number of pages7
JournalJournal of Computer Science and Technology
Volume20
Issue number3
DOIs
Publication statusPublished - May 2005
Externally publishedYes

Keywords

  • Categorization
  • Novelty detection
  • Sentence semantic distance

Fingerprint

Dive into the research topics of 'Computation on sentence semantic distance for novelty detection'. Together they form a unique fingerprint.

Cite this