Abstract
k-Nearest Neighbor (KNN) algorithm has the advantage of high accuracy and stability. But the time complexity of KNN is directly proportional to the sample size, its classification speed is low and it is problematic to be put into practice in large-scale information processing. An improved KNN text categorization algorithm is proposed which classifies faster than the traditional KNN does. Firstly, some similar sample documents are combined into a center document through adopting automatic text clustering technology. Then, a large number of original samples are replaced with the small amount of sample cluster centers. Therefore, the calculation amount of KNN is reduced greatly and the classification is speeded up. The experimental results show that the time complexity of the proposed algorithm is decreased by one order of magnitude and its accuracy is approximately equal to those of the SVM and traditional KNN.
Original language | English |
---|---|
Pages (from-to) | 936-940 |
Number of pages | 5 |
Journal | Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence |
Volume | 22 |
Issue number | 6 |
Publication status | Published - Dec 2009 |
Externally published | Yes |
Keywords
- Cluster center
- Natural language processing (NLP)
- Text categorization
- Text clustering
- k-Nearest neighbor (KNN)