TY - GEN
T1 - The application of CRFs in part-of-speech tagging
AU - Zhang, Xiaofei
AU - Huang, Heyan
AU - Zhang, Liang
PY - 2009
Y1 - 2009
N2 - Conditional random fields (CRFs) for sequence labeling offer advantages over both generative models like Hidden Markov model (HMM) and classifiers applied at each sequence position. First, the CRFs don't force to adhere to the independence assumption and thus can depend on arbitrary, non-independent features, without accounting for the distribution of those dependencies. Since CRFs models are able to flexibly utilize a wide variety of features, the training data sparse problem can be efficiently resolved. Moreover, the parameter estimation for CRFs is global, which effectively resolve the label bias problem. In this paper, the CRFs with Gaussian prior smoothing is used for Part-of-Speech (POS) tagging. Experiments show that the POS tagging error rate is reduced by 55.17% in close test and 43.64% in open test over HMM-based baseline, and synchronously an accuracy of 98.05% in close test and 95.79% in open test are also achieved. These positive results confirm CRFs superior performance.
AB - Conditional random fields (CRFs) for sequence labeling offer advantages over both generative models like Hidden Markov model (HMM) and classifiers applied at each sequence position. First, the CRFs don't force to adhere to the independence assumption and thus can depend on arbitrary, non-independent features, without accounting for the distribution of those dependencies. Since CRFs models are able to flexibly utilize a wide variety of features, the training data sparse problem can be efficiently resolved. Moreover, the parameter estimation for CRFs is global, which effectively resolve the label bias problem. In this paper, the CRFs with Gaussian prior smoothing is used for Part-of-Speech (POS) tagging. Experiments show that the POS tagging error rate is reduced by 55.17% in close test and 43.64% in open test over HMM-based baseline, and synchronously an accuracy of 98.05% in close test and 95.79% in open test are also achieved. These positive results confirm CRFs superior performance.
KW - CRF
KW - HMM
KW - Natural Language Processing (NLP)
KW - POS tagging
UR - http://www.scopus.com/inward/record.url?scp=73649089989&partnerID=8YFLogxK
U2 - 10.1109/IHMSC.2009.210
DO - 10.1109/IHMSC.2009.210
M3 - Conference contribution
AN - SCOPUS:73649089989
SN - 9780769537528
T3 - 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009
SP - 347
EP - 350
BT - 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009
T2 - 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009
Y2 - 26 August 2009 through 27 August 2009
ER -