Research on recognition of semantic chunk boundary in Tibetan

Tianhang Wang, Shumin Shi*, Heyan Huang, Congjun Long, Ruijing Li

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.

源语言英语
主期刊名Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014
编辑Rafael E. Banchs, Minghui Dong, Yanfeng Lu, Bali Ranaivo-Malancon
出版商Institute of Electrical and Electronics Engineers Inc.
78-82
页数5
ISBN(电子版)9781479953301
DOI
出版状态已出版 - 3 12月 2014
活动International Conference on Asian Language Processing 2014, IALP 2014 - Kuching, 马来西亚
期限: 20 10月 201422 10月 2014

出版系列

姓名Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014

会议

会议International Conference on Asian Language Processing 2014, IALP 2014
国家/地区马来西亚
Kuching
时期20/10/1422/10/14

指纹

探究 'Research on recognition of semantic chunk boundary in Tibetan' 的科研主题。它们共同构成独一无二的指纹。

引用此