TY - GEN
T1 - Research on recognition of semantic chunk boundary in Tibetan
AU - Wang, Tianhang
AU - Shi, Shumin
AU - Huang, Heyan
AU - Long, Congjun
AU - Li, Ruijing
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/3
Y1 - 2014/12/3
N2 - Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.
AB - Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.
KW - CRF
KW - ME
KW - Tibetan semantic chunk
KW - chunk boundary recognition
UR - http://www.scopus.com/inward/record.url?scp=84941056782&partnerID=8YFLogxK
U2 - 10.1109/IALP.2014.6973476
DO - 10.1109/IALP.2014.6973476
M3 - Conference contribution
AN - SCOPUS:84941056782
T3 - Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014
SP - 78
EP - 82
BT - Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014
A2 - Banchs, Rafael E.
A2 - Dong, Minghui
A2 - Lu, Yanfeng
A2 - Ranaivo-Malancon, Bali
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Conference on Asian Language Processing 2014, IALP 2014
Y2 - 20 October 2014 through 22 October 2014
ER -