Extracting Chinese multi-word terms from small corpus

Lang Zhou*, Liang Zhang, Chong Feng, Heyan Huang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In this paper, we present an automatic terminology extraction approach for Chinese multi-word terms. In this term extraction system, besides five linguistic rides acqidred from an available term list by some machine learning methods, two statistical strategies are involved: a termhood measure based on the term distribution variation, and a unithood measure adopting the left and right entropy method to estimate the collocation variation degree. The candidates are ranked according to the values of the former. The latter is used to filter the preposition phrases and some verb-object phrases that rarely appear as terms. By validating on a small scale corpus in the computer domain, the precision reaches 91.5% of the top 2000 outputs.

源语言英语
主期刊名Proceedings of 2008 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008
813-818
页数6
DOI
出版状态已出版 - 2008
已对外发布
活动Proceedings of 2008 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008 - Xiamen, 中国
期限: 17 11月 200819 11月 2008

出版系列

姓名Proceedings of 2008 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008

会议

会议Proceedings of 2008 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008
国家/地区中国
Xiamen
时期17/11/0819/11/08

指纹

探究 'Extracting Chinese multi-word terms from small corpus' 的科研主题。它们共同构成独一无二的指纹。

引用此