Research on recognition of semantic chunk boundary in Tibetan

Tianhang Wang, Shumin Shi*, Heyan Huang, Congjun Long, Ruijing Li

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Asian Language Processing 2014, IALP 2014
EditorsRafael E. Banchs, Minghui Dong, Yanfeng Lu, Bali Ranaivo-Malancon
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages78-82
Number of pages5
ISBN (Electronic)9781479953301
DOIs
Publication statusPublished - 3 Dec 2014
EventInternational Conference on Asian Language Processing 2014, IALP 2014 - Kuching, Malaysia
Duration: 20 Oct 201422 Oct 2014

Publication series

NameProceedings of the International Conference on Asian Language Processing 2014, IALP 2014

Conference

ConferenceInternational Conference on Asian Language Processing 2014, IALP 2014
Country/TerritoryMalaysia
CityKuching
Period20/10/1422/10/14

Keywords

  • CRF
  • ME
  • Tibetan semantic chunk
  • chunk boundary recognition

Fingerprint

Dive into the research topics of 'Research on recognition of semantic chunk boundary in Tibetan'. Together they form a unique fingerprint.

Cite this