The state of the art and difficulties in automatic Chinese word segmentation

Chun Xia Zhang*, Tian Yong Hao

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

28 引用 (Scopus)

摘要

Automatic Chinese word segmentation is a basic research issue on Chinese information processing tasks such as information extraction, information retrieval, machine translation, text classification, automatic text summarization, speech recognition, text-to-speech, natural language understanding, and so on. Though it has been investigated for more than twenty years, it is still a bottleneck for Chinese information processing. We give a detailed analysis of the state of the art in automatic Chinese word segmentation, build a formal model of word segmentation, discuss factors affecting word segmentation and the two great difficulties in word segmentation and their resolutions, and finally, point out the existing problems, especially those on the word segmentation evaluation, as well as the research problems to be resolved.

源语言英语
页(从-至)138-143+147
期刊Xitong Fangzhen Xuebao / Journal of System Simulation
17
1
出版状态已出版 - 1月 2005
已对外发布

指纹

探究 'The state of the art and difficulties in automatic Chinese word segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此