The state of the art and difficulties in automatic Chinese word segmentation

Chun Xia Zhang*, Tian Yong Hao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)

Abstract

Automatic Chinese word segmentation is a basic research issue on Chinese information processing tasks such as information extraction, information retrieval, machine translation, text classification, automatic text summarization, speech recognition, text-to-speech, natural language understanding, and so on. Though it has been investigated for more than twenty years, it is still a bottleneck for Chinese information processing. We give a detailed analysis of the state of the art in automatic Chinese word segmentation, build a formal model of word segmentation, discuss factors affecting word segmentation and the two great difficulties in word segmentation and their resolutions, and finally, point out the existing problems, especially those on the word segmentation evaluation, as well as the research problems to be resolved.

Original languageEnglish
Pages (from-to)138-143+147
JournalXitong Fangzhen Xuebao / Journal of System Simulation
Volume17
Issue number1
Publication statusPublished - Jan 2005
Externally publishedYes

Keywords

  • Automatic Chinese word segmentation
  • Formal model
  • Unknown words
  • Word segmentation evaluation

Fingerprint

Dive into the research topics of 'The state of the art and difficulties in automatic Chinese word segmentation'. Together they form a unique fingerprint.

Cite this