Part-of-speech tagger based on maximum entropy model

Heyan Huang; Xiaofei Zhang

doi:10.1109/ICCSIT.2009.5234787

Part-of-speech tagger based on maximum entropy model

Heyan Huang^*, Xiaofei Zhang

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

10 引用（Scopus）

摘要

The maximum entropy (ME) conditional models don't force to adhere to the independence assumption such as in Hidden Markov generative models, and thus the ME -based Part-of-Speech (POS) tagger can depend on arbitrary, nonindependent features, which are benefit to the POS tagging, without accounting for the distribution of those dependencies. Since ME models are able to flexibly utilize a wide variety of features, the sparse problem of training data is efficiently solved. Experiments show that the POS tagging error rate is reduced by 54.25% in close test and 40.56% in open test over the Hidden-Markov-Model-based baseline, and synchronously an accuracy of 98.01% in close test and 95.56%in open test are obtained.

源语言	英语
主期刊名	Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009
页	26-29
页数	4
DOI	https://doi.org/10.1109/ICCSIT.2009.5234787
出版状态	已出版 - 2009
活动	2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009 - Beijing, 中国期限: 8 8月 2009 → 11 8月 2009

出版系列

姓名	Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009

会议

会议	2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009
国家/地区	中国
市	Beijing
时期	8/08/09 → 11/08/09

访问文件

10.1109/ICCSIT.2009.5234787

其它文件与链接

链接到 Scopus 的出版物

引用此

Huang, H., & Zhang, X. (2009). Part-of-speech tagger based on maximum entropy model. 在 Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009 (页码 26-29). 文章 5234787 (Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009). https://doi.org/10.1109/ICCSIT.2009.5234787

@inproceedings{c431b45d52f84fafbe9d9c19ab175871,

title = "Part-of-speech tagger based on maximum entropy model",

abstract = "The maximum entropy (ME) conditional models don't force to adhere to the independence assumption such as in Hidden Markov generative models, and thus the ME -based Part-of-Speech (POS) tagger can depend on arbitrary, nonindependent features, which are benefit to the POS tagging, without accounting for the distribution of those dependencies. Since ME models are able to flexibly utilize a wide variety of features, the sparse problem of training data is efficiently solved. Experiments show that the POS tagging error rate is reduced by 54.25% in close test and 40.56% in open test over the Hidden-Markov-Model-based baseline, and synchronously an accuracy of 98.01% in close test and 95.56%in open test are obtained.",

keywords = "Hidden markov model (HMM), ME model, Natural language processing (NLP), POS tagging",

author = "Heyan Huang and Xiaofei Zhang",

year = "2009",

doi = "10.1109/ICCSIT.2009.5234787",

language = "English",

isbn = "9781424445196",

series = "Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009",

pages = "26--29",

booktitle = "Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009",

note = "2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009 ; Conference date: 08-08-2009 Through 11-08-2009",

}

Huang, H & Zhang, X 2009, Part-of-speech tagger based on maximum entropy model. 在 Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009., 5234787, Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, 页码 26-29, 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, Beijing, 中国, 8/08/09. https://doi.org/10.1109/ICCSIT.2009.5234787

Part-of-speech tagger based on maximum entropy model. / Huang, Heyan; Zhang, Xiaofei.
Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009. 2009. 页码 26-29 5234787 (Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Part-of-speech tagger based on maximum entropy model

AU - Huang, Heyan

AU - Zhang, Xiaofei

PY - 2009

Y1 - 2009

N2 - The maximum entropy (ME) conditional models don't force to adhere to the independence assumption such as in Hidden Markov generative models, and thus the ME -based Part-of-Speech (POS) tagger can depend on arbitrary, nonindependent features, which are benefit to the POS tagging, without accounting for the distribution of those dependencies. Since ME models are able to flexibly utilize a wide variety of features, the sparse problem of training data is efficiently solved. Experiments show that the POS tagging error rate is reduced by 54.25% in close test and 40.56% in open test over the Hidden-Markov-Model-based baseline, and synchronously an accuracy of 98.01% in close test and 95.56%in open test are obtained.

AB - The maximum entropy (ME) conditional models don't force to adhere to the independence assumption such as in Hidden Markov generative models, and thus the ME -based Part-of-Speech (POS) tagger can depend on arbitrary, nonindependent features, which are benefit to the POS tagging, without accounting for the distribution of those dependencies. Since ME models are able to flexibly utilize a wide variety of features, the sparse problem of training data is efficiently solved. Experiments show that the POS tagging error rate is reduced by 54.25% in close test and 40.56% in open test over the Hidden-Markov-Model-based baseline, and synchronously an accuracy of 98.01% in close test and 95.56%in open test are obtained.

KW - Hidden markov model (HMM)

KW - ME model

KW - Natural language processing (NLP)

KW - POS tagging

UR - http://www.scopus.com/inward/record.url?scp=70449093855&partnerID=8YFLogxK

U2 - 10.1109/ICCSIT.2009.5234787

DO - 10.1109/ICCSIT.2009.5234787

M3 - Conference contribution

AN - SCOPUS:70449093855

SN - 9781424445196

T3 - Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009

SP - 26

EP - 29

BT - Proceedings - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009

T2 - 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009

Y2 - 8 August 2009 through 11 August 2009

ER -

Part-of-speech tagger based on maximum entropy model

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此