TY - GEN
T1 - A pragmatic model for new Chinese word extraction
AU - Zhang, Haijun
AU - Huang, Heyan
AU - Zhu, Chaoyong
AU - Shi, Shumin
PY - 2010
Y1 - 2010
N2 - This paper proposed a pragmatic model for repeat-based Chinese New Word Extraction (NWE). It contains two innovations. The first is a formal description for the process of NWE, which gives instructions on feature selection in theory. On the basis of this, the Conditional Random Fields model (CRF) is selected as statistical framework to solve the formal description. The second is an improved algorithm for left (right) entropy to improve the efficiency of NWE. By comparing with baseline algorithm, the improved algorithm can enhance the computational speed of entropy remarkably. On the whole, experiments show that the model this paper proposed is very effective, and the F score is 49.72% in open test and 69.83% in word extraction respectively, which is an evident improvement over previous similar works.
AB - This paper proposed a pragmatic model for repeat-based Chinese New Word Extraction (NWE). It contains two innovations. The first is a formal description for the process of NWE, which gives instructions on feature selection in theory. On the basis of this, the Conditional Random Fields model (CRF) is selected as statistical framework to solve the formal description. The second is an improved algorithm for left (right) entropy to improve the efficiency of NWE. By comparing with baseline algorithm, the improved algorithm can enhance the computational speed of entropy remarkably. On the whole, experiments show that the model this paper proposed is very effective, and the F score is 49.72% in open test and 69.83% in word extraction respectively, which is an evident improvement over previous similar works.
KW - Computational efficiency
KW - Formal description
KW - Left (right) entropy
KW - New words extraction
KW - Repeat
UR - http://www.scopus.com/inward/record.url?scp=78649260535&partnerID=8YFLogxK
U2 - 10.1109/NLPKE.2010.5587846
DO - 10.1109/NLPKE.2010.5587846
M3 - Conference contribution
AN - SCOPUS:78649260535
SN - 9781424468966
T3 - Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010
BT - Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE, 2010
T2 - 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010
Y2 - 21 August 2010 through 23 August 2010
ER -