A method of part-of-speech guessing of chinese unknown words based on combined features

Hai Jun Zhang*, Shu Min Shi, Chong Feng, He Yan Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Part-Of-Speech (POS) guessing of Unknown Words is an essential phase in the process of Unknown Words Identification. This paper applies combined features (namely, both external and internal features) in POS guessing of Chinese unknown words, under Conditional Random Field model (CRF). For acquiring high-precision of POS guessing, this paper puts forward a method of integrating Chinese radical, as a new internal feature of Chinese characters, into the existing feature set. Experiments show that the application of combined features is effective for POS guessing, and the new feature can significantly improve the performance of POS guessing (precision is up to 94.67%). The results also show that Chinese radical, as an effective internal feature in the field of lexical analysis, has a certain practical value.

Original languageEnglish
Title of host publicationProceedings of the 2009 International Conference on Machine Learning and Cybernetics
Pages328-332
Number of pages5
DOIs
Publication statusPublished - 2009
Event2009 International Conference on Machine Learning and Cybernetics - Baoding, China
Duration: 12 Jul 200915 Jul 2009

Publication series

NameProceedings of the 2009 International Conference on Machine Learning and Cybernetics
Volume1

Conference

Conference2009 International Conference on Machine Learning and Cybernetics
Country/TerritoryChina
CityBaoding
Period12/07/0915/07/09

Keywords

  • CRF
  • Chinese word segmentation
  • POS guessing
  • Unknown words

Fingerprint

Dive into the research topics of 'A method of part-of-speech guessing of chinese unknown words based on combined features'. Together they form a unique fingerprint.

Cite this