Abstract
Automatic Chinese word segmentation is a basic research issue on Chinese information processing tasks such as information extraction, information retrieval, machine translation, text classification, automatic text summarization, speech recognition, text-to-speech, natural language understanding, and so on. Though it has been investigated for more than twenty years, it is still a bottleneck for Chinese information processing. We give a detailed analysis of the state of the art in automatic Chinese word segmentation, build a formal model of word segmentation, discuss factors affecting word segmentation and the two great difficulties in word segmentation and their resolutions, and finally, point out the existing problems, especially those on the word segmentation evaluation, as well as the research problems to be resolved.
Original language | English |
---|---|
Pages (from-to) | 138-143+147 |
Journal | Xitong Fangzhen Xuebao / Journal of System Simulation |
Volume | 17 |
Issue number | 1 |
Publication status | Published - Jan 2005 |
Externally published | Yes |
Keywords
- Automatic Chinese word segmentation
- Formal model
- Unknown words
- Word segmentation evaluation