TY - GEN
T1 - Study on Chinese error checking
AU - Wang, Chongwen
AU - Yuan, Bo
PY - 2012
Y1 - 2012
N2 - The word-level error checking in Chinese has been discussed. During words Segmentation, the algorithm is divided into two steps. Firstly, the longest match algorithm of forward heuristic, reverse backtracking and the recursive word segmentation algorithm of left and right sub-segment have been used to divide the text into more small loose strings. Secondly, the forward longest matching algorithm has been used to merge casual strings backward as far as possible, and the casual strings being segmented are the basis of error checking operation later. In the system of error detecting, an algorithm based on similar pronunciation strategy has been introduced. This strategy uses large-scale lexicon (340 millions) as the basis of data analysis. Then, error checking algorithm that based on similar shape which includes similar character table, Wubi repeat-code table, and Zhengma repeat-code table has been introduced to check character error. Experiments show satisfactory results.
AB - The word-level error checking in Chinese has been discussed. During words Segmentation, the algorithm is divided into two steps. Firstly, the longest match algorithm of forward heuristic, reverse backtracking and the recursive word segmentation algorithm of left and right sub-segment have been used to divide the text into more small loose strings. Secondly, the forward longest matching algorithm has been used to merge casual strings backward as far as possible, and the casual strings being segmented are the basis of error checking operation later. In the system of error detecting, an algorithm based on similar pronunciation strategy has been introduced. This strategy uses large-scale lexicon (340 millions) as the basis of data analysis. Then, error checking algorithm that based on similar shape which includes similar character table, Wubi repeat-code table, and Zhengma repeat-code table has been introduced to check character error. Experiments show satisfactory results.
KW - Chinese error
KW - lexicon
KW - similar pronunciation
KW - similar shape
UR - https://www.scopus.com/pages/publications/84860321679
U2 - 10.1007/978-3-642-27945-4_23
DO - 10.1007/978-3-642-27945-4_23
M3 - Conference contribution
AN - SCOPUS:84860321679
SN - 9783642279447
T3 - Advances in Intelligent and Soft Computing
SP - 147
EP - 154
BT - Advances in Computer Science and Education
T2 - 2011 International Conference on Computer Science and Education, CSE 2011
Y2 - 26 November 2011 through 27 November 2011
ER -