A pragmatic model for new Chinese word extraction

Haijun Zhang*, Heyan Huang, Chaoyong Zhu, Shumin Shi

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

5 引用 (Scopus)

摘要

This paper proposed a pragmatic model for repeat-based Chinese New Word Extraction (NWE). It contains two innovations. The first is a formal description for the process of NWE, which gives instructions on feature selection in theory. On the basis of this, the Conditional Random Fields model (CRF) is selected as statistical framework to solve the formal description. The second is an improved algorithm for left (right) entropy to improve the efficiency of NWE. By comparing with baseline algorithm, the improved algorithm can enhance the computational speed of entropy remarkably. On the whole, experiments show that the model this paper proposed is very effective, and the F score is 49.72% in open test and 69.83% in word extraction respectively, which is an evident improvement over previous similar works.

源语言英语
主期刊名Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE, 2010
DOI
出版状态已出版 - 2010
活动6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010 - Beijing, 中国
期限: 21 8月 201023 8月 2010

出版系列

姓名Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010

会议

会议6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010
国家/地区中国
Beijing
时期21/08/1023/08/10

指纹

探究 'A pragmatic model for new Chinese word extraction' 的科研主题。它们共同构成独一无二的指纹。

引用此