KNN text categorization algorithm based on semantic centre

Xiao Fei Zhang*, He Yan Huang, Ke Liang Zhang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

As a classical statistical pattern recognition algorithm characterized with high accuracy and stability, KNN has been used widely in text categorization. But since KNN's time complexity is directly proportional to the sample size, its classification speed is very slow. In this paper, we propose a new KNN text categorization algorithm based on semantic centre, which we call SKNN, to speed up the classification. The basic thread is to replace the large number of original sample documents with a small amount of sample semantic centers. Experiments have proved that the SKNN's clarification is over 10 times as fast as that of the traditional KNN and its F1 value is approximately equal to SVM and traditional KNN algorithm.

Original languageEnglish
Title of host publicationProceedings - 2009 International Conference on Information Technology and Computer Science, ITCS 2009
Pages249-252
Number of pages4
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event2009 International Conference on Information Technology and Computer Science, ITCS 2009 - Kiev, Ukraine
Duration: 25 Jul 200926 Jul 2009

Publication series

NameProceedings - 2009 International Conference on Information Technology and Computer Science, ITCS 2009
Volume1

Conference

Conference2009 International Conference on Information Technology and Computer Science, ITCS 2009
Country/TerritoryUkraine
CityKiev
Period25/07/0926/07/09

Keywords

  • KNN
  • Semantic center
  • Text categorization

Fingerprint

Dive into the research topics of 'KNN text categorization algorithm based on semantic centre'. Together they form a unique fingerprint.

Cite this