Tibetan syllable-based functional chunk boundary identification

Shumin Shi*, Yujian Liu, Tianhang Wang, Congjun Long, Heyan Huang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Tibetan syntactic functional chunk parsing is aimed at identifying syntactic constituents of Tibetan sentences. In this paper, based on the Tibetan syntactic functional chunk description system, we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Conditional Random Fields (CRFs) to identify the functional chunk boundary of a sentence. According to the actual characteristics of the Tibetan language, we firstly identify and extract the syntactic markers as identification characteristics of syntactic functional chunk boundary in the text preprocessing stage, while the syntactic markers are composed of the sticky written form and the non-sticky written form. Afterwards we identify the syntactic functional chunk boundary using CRF. Experiments have been performed on a Tibetan language corpus containing 46783 syllables and the precision, recall rate and F value respectively achieves 75.70%, 82.54% and 79.12%. The experiment results show that the proposed method is effective when applied to a small-scale unlabeled corpus and can provide foundational support for many natural language processing applications such as machine translation.

源语言英语
主期刊名Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 16th China National Conference, CCL 2017 and 5th International Symposium, NLP-NABD 2017, Proceedings
编辑Maosong Sun, Baobao Chang, Xiaojie Wang, Deyi Xiong
出版商Springer Verlag
439-448
页数10
ISBN(印刷版)9783319690049
DOI
出版状态已出版 - 2017
活动16th China National Conference on Computational Linguistics, CCL 2017 and 5th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2017 - Nanjing, 中国
期限: 13 10月 201715 10月 2017

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
10565 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议16th China National Conference on Computational Linguistics, CCL 2017 and 5th International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2017
国家/地区中国
Nanjing
时期13/10/1715/10/17

指纹

探究 'Tibetan syllable-based functional chunk boundary identification' 的科研主题。它们共同构成独一无二的指纹。

引用此