A kind of self-constructed category dictionary in Chinese text classification

Kun Zhou*, Ya Ping Dai, Feng Gao, Ji Hong Zou

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

By means of word-segmentation technology in TRIP database and each word that appears in a database will be account in detail, a kind of self-constructed category dictionary (SCC-dictionary) in Chinese text classification is proposed. For solving high dimension and sparseness problem exit in vector space model, a four-dimensional feature vector space model (FFVSM) is presented in this paper. With Support Vector Machine (SVM) algorithm, the text classifier is designed. Experimental results show there are two achievements in this paper: first, SCC-dictionary can replace the artificial-written dictionary with the same effect; second, the FFVSM will not only reduce the computing load than high-dimensional feature vector space model, but also keep the precision of classification as 86.87%, recall rate as 95.12%, and F1 value as 90.81%.

Original languageEnglish
Title of host publicationMachine Tool Technology, Mechatronics and Information Engineering
EditorsZhongmin Wang, Liangyu Guo, Jianming Tan, Dongfang Yang, Dongfang Yang, Kun Yang, Dongfang Yang, Dongfang Yang, Dongfang Yang
PublisherTrans Tech Publications Ltd.
Pages2206-2210
Number of pages5
ISBN (Electronic)9783038352464
DOIs
Publication statusPublished - 2014
EventInternational Conference on Machine Tool Technology and Mechatronics Engineering, ICMTTME 2014 - Guilin, China
Duration: 22 Jun 201423 Jun 2014

Publication series

NameApplied Mechanics and Materials
Volume644-650
ISSN (Print)1660-9336
ISSN (Electronic)1662-7482

Conference

ConferenceInternational Conference on Machine Tool Technology and Mechatronics Engineering, ICMTTME 2014
Country/TerritoryChina
CityGuilin
Period22/06/1423/06/14

Keywords

  • Chinese text classification
  • FFVSM
  • SCC-dictionary
  • SVM
  • TRIP database

Fingerprint

Dive into the research topics of 'A kind of self-constructed category dictionary in Chinese text classification'. Together they form a unique fingerprint.

Cite this