Study on Chinese error checking

Chongwen Wang*, Bo Yuan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The word-level error checking in Chinese has been discussed. During words Segmentation, the algorithm is divided into two steps. Firstly, the longest match algorithm of forward heuristic, reverse backtracking and the recursive word segmentation algorithm of left and right sub-segment have been used to divide the text into more small loose strings. Secondly, the forward longest matching algorithm has been used to merge casual strings backward as far as possible, and the casual strings being segmented are the basis of error checking operation later. In the system of error detecting, an algorithm based on similar pronunciation strategy has been introduced. This strategy uses large-scale lexicon (340 millions) as the basis of data analysis. Then, error checking algorithm that based on similar shape which includes similar character table, Wubi repeat-code table, and Zhengma repeat-code table has been introduced to check character error. Experiments show satisfactory results.

Original languageEnglish
Title of host publicationAdvances in Computer Science and Education
Pages147-154
Number of pages8
DOIs
Publication statusPublished - 2012
Event2011 International Conference on Computer Science and Education, CSE 2011 - Wuhan, China
Duration: 26 Nov 201127 Nov 2011

Publication series

NameAdvances in Intelligent and Soft Computing
Volume140 AISC
ISSN (Print)1867-5662

Conference

Conference2011 International Conference on Computer Science and Education, CSE 2011
Country/TerritoryChina
CityWuhan
Period26/11/1127/11/11

Keywords

  • Chinese error
  • lexicon
  • similar pronunciation
  • similar shape

Fingerprint

Dive into the research topics of 'Study on Chinese error checking'. Together they form a unique fingerprint.

Cite this