The application of CRFs in part-of-speech tagging

Xiaofei Zhang; Heyan Huang; Liang Zhang

doi:10.1109/IHMSC.2009.210

The application of CRFs in part-of-speech tagging

Xiaofei Zhang^*, Heyan Huang, Liang Zhang

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

6 引用（Scopus）

摘要

Conditional random fields (CRFs) for sequence labeling offer advantages over both generative models like Hidden Markov model (HMM) and classifiers applied at each sequence position. First, the CRFs don't force to adhere to the independence assumption and thus can depend on arbitrary, non-independent features, without accounting for the distribution of those dependencies. Since CRFs models are able to flexibly utilize a wide variety of features, the training data sparse problem can be efficiently resolved. Moreover, the parameter estimation for CRFs is global, which effectively resolve the label bias problem. In this paper, the CRFs with Gaussian prior smoothing is used for Part-of-Speech (POS) tagging. Experiments show that the POS tagging error rate is reduced by 55.17% in close test and 43.64% in open test over HMM-based baseline, and synchronously an accuracy of 98.05% in close test and 95.79% in open test are also achieved. These positive results confirm CRFs superior performance.

源语言	英语
主期刊名	2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009
页	347-350
页数	4
DOI	https://doi.org/10.1109/IHMSC.2009.210
出版状态	已出版 - 2009
活动	2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009 - Hangzhou, Zhejiang, 中国期限: 26 8月 2009 → 27 8月 2009

出版系列

姓名	2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009
卷	2

会议

会议	2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009
国家/地区	中国
市	Hangzhou, Zhejiang
时期	26/08/09 → 27/08/09

访问文件

10.1109/IHMSC.2009.210

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, X., Huang, H., & Zhang, L. (2009). The application of CRFs in part-of-speech tagging. 在 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009 (页码 347-350). 文章 5335969 (2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009; 卷 2). https://doi.org/10.1109/IHMSC.2009.210

@inproceedings{89fa0c210a264e029902b4acd70cffe2,

title = "The application of CRFs in part-of-speech tagging",

abstract = "Conditional random fields (CRFs) for sequence labeling offer advantages over both generative models like Hidden Markov model (HMM) and classifiers applied at each sequence position. First, the CRFs don't force to adhere to the independence assumption and thus can depend on arbitrary, non-independent features, without accounting for the distribution of those dependencies. Since CRFs models are able to flexibly utilize a wide variety of features, the training data sparse problem can be efficiently resolved. Moreover, the parameter estimation for CRFs is global, which effectively resolve the label bias problem. In this paper, the CRFs with Gaussian prior smoothing is used for Part-of-Speech (POS) tagging. Experiments show that the POS tagging error rate is reduced by 55.17% in close test and 43.64% in open test over HMM-based baseline, and synchronously an accuracy of 98.05% in close test and 95.79% in open test are also achieved. These positive results confirm CRFs superior performance.",

keywords = "CRF, HMM, Natural Language Processing (NLP), POS tagging",

author = "Xiaofei Zhang and Heyan Huang and Liang Zhang",

year = "2009",

doi = "10.1109/IHMSC.2009.210",

language = "English",

isbn = "9780769537528",

series = "2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009",

pages = "347--350",

booktitle = "2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009",

note = "2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009 ; Conference date: 26-08-2009 Through 27-08-2009",

}

Zhang, X, Huang, H & Zhang, L 2009, The application of CRFs in part-of-speech tagging. 在 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009., 5335969, 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009, 卷 2, 页码 347-350, 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009, Hangzhou, Zhejiang, 中国, 26/08/09. https://doi.org/10.1109/IHMSC.2009.210

The application of CRFs in part-of-speech tagging. / Zhang, Xiaofei; Huang, Heyan; Zhang, Liang.
2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009. 2009. 页码 347-350 5335969 (2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009; 卷 2).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - The application of CRFs in part-of-speech tagging

AU - Zhang, Xiaofei

AU - Huang, Heyan

AU - Zhang, Liang

PY - 2009

Y1 - 2009

N2 - Conditional random fields (CRFs) for sequence labeling offer advantages over both generative models like Hidden Markov model (HMM) and classifiers applied at each sequence position. First, the CRFs don't force to adhere to the independence assumption and thus can depend on arbitrary, non-independent features, without accounting for the distribution of those dependencies. Since CRFs models are able to flexibly utilize a wide variety of features, the training data sparse problem can be efficiently resolved. Moreover, the parameter estimation for CRFs is global, which effectively resolve the label bias problem. In this paper, the CRFs with Gaussian prior smoothing is used for Part-of-Speech (POS) tagging. Experiments show that the POS tagging error rate is reduced by 55.17% in close test and 43.64% in open test over HMM-based baseline, and synchronously an accuracy of 98.05% in close test and 95.79% in open test are also achieved. These positive results confirm CRFs superior performance.

AB - Conditional random fields (CRFs) for sequence labeling offer advantages over both generative models like Hidden Markov model (HMM) and classifiers applied at each sequence position. First, the CRFs don't force to adhere to the independence assumption and thus can depend on arbitrary, non-independent features, without accounting for the distribution of those dependencies. Since CRFs models are able to flexibly utilize a wide variety of features, the training data sparse problem can be efficiently resolved. Moreover, the parameter estimation for CRFs is global, which effectively resolve the label bias problem. In this paper, the CRFs with Gaussian prior smoothing is used for Part-of-Speech (POS) tagging. Experiments show that the POS tagging error rate is reduced by 55.17% in close test and 43.64% in open test over HMM-based baseline, and synchronously an accuracy of 98.05% in close test and 95.79% in open test are also achieved. These positive results confirm CRFs superior performance.

KW - CRF

KW - HMM

KW - Natural Language Processing (NLP)

KW - POS tagging

UR - http://www.scopus.com/inward/record.url?scp=73649089989&partnerID=8YFLogxK

U2 - 10.1109/IHMSC.2009.210

DO - 10.1109/IHMSC.2009.210

M3 - Conference contribution

AN - SCOPUS:73649089989

SN - 9780769537528

T3 - 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009

SP - 347

EP - 350

BT - 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009

T2 - 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2009

Y2 - 26 August 2009 through 27 August 2009

ER -

The application of CRFs in part-of-speech tagging

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此