Research on recognition of semantic chunk boundary in Tibetan

Tianhang Wang; Shumin Shi; Heyan Huang; Congjun Long; Ruijing Li

doi:10.1109/IALP.2014.6973476

Research on recognition of semantic chunk boundary in Tibetan

Tianhang Wang, Shumin Shi^*, Heyan Huang, Congjun Long, Ruijing Li

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.

Original language	English
Title of host publication	Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014
Editors	Rafael E. Banchs, Minghui Dong, Yanfeng Lu, Bali Ranaivo-Malancon
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	78-82
Number of pages	5
ISBN (Electronic)	9781479953301
DOIs	https://doi.org/10.1109/IALP.2014.6973476
Publication status	Published - 3 Dec 2014
Event	International Conference on Asian Language Processing 2014, IALP 2014 - Kuching, Malaysia Duration: 20 Oct 2014 → 22 Oct 2014

Publication series

Name	Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014

Conference

Conference	International Conference on Asian Language Processing 2014, IALP 2014
Country/Territory	Malaysia
City	Kuching
Period	20/10/14 → 22/10/14

Keywords

CRF
ME
Tibetan semantic chunk
chunk boundary recognition

Access to Document

10.1109/IALP.2014.6973476

Cite this

Wang, T., Shi, S., Huang, H., Long, C., & Li, R. (2014). Research on recognition of semantic chunk boundary in Tibetan. In R. E. Banchs, M. Dong, Y. Lu, & B. Ranaivo-Malancon (Eds.), Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014 (pp. 78-82). Article 6973476 (Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IALP.2014.6973476

Wang, Tianhang ; Shi, Shumin ; Huang, Heyan et al. / Research on recognition of semantic chunk boundary in Tibetan. Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014. editor / Rafael E. Banchs ; Minghui Dong ; Yanfeng Lu ; Bali Ranaivo-Malancon. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 78-82 (Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014).

@inproceedings{bcbbc5ca4c1e446ea98be752c7eb6458,

title = "Research on recognition of semantic chunk boundary in Tibetan",

abstract = "Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.",

keywords = "CRF, ME, Tibetan semantic chunk, chunk boundary recognition",

author = "Tianhang Wang and Shumin Shi and Heyan Huang and Congjun Long and Ruijing Li",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.; International Conference on Asian Language Processing 2014, IALP 2014 ; Conference date: 20-10-2014 Through 22-10-2014",

year = "2014",

month = dec,

day = "3",

doi = "10.1109/IALP.2014.6973476",

language = "English",

series = "Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "78--82",

editor = "Banchs, {Rafael E.} and Minghui Dong and Yanfeng Lu and Bali Ranaivo-Malancon",

booktitle = "Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014",

address = "United States",

}

Wang, T, Shi, S, Huang, H, Long, C & Li, R 2014, Research on recognition of semantic chunk boundary in Tibetan. in RE Banchs, M Dong, Y Lu & B Ranaivo-Malancon (eds), Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014., 6973476, Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014, Institute of Electrical and Electronics Engineers Inc., pp. 78-82, International Conference on Asian Language Processing 2014, IALP 2014, Kuching, Malaysia, 20/10/14. https://doi.org/10.1109/IALP.2014.6973476

Research on recognition of semantic chunk boundary in Tibetan. / Wang, Tianhang; Shi, Shumin; Huang, Heyan et al.
Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014. ed. / Rafael E. Banchs; Minghui Dong; Yanfeng Lu; Bali Ranaivo-Malancon. Institute of Electrical and Electronics Engineers Inc., 2014. p. 78-82 6973476 (Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Research on recognition of semantic chunk boundary in Tibetan

AU - Wang, Tianhang

AU - Shi, Shumin

AU - Huang, Heyan

AU - Long, Congjun

AU - Li, Ruijing

PY - 2014/12/3

Y1 - 2014/12/3

N2 - Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.

AB - Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.

KW - CRF

KW - ME

KW - Tibetan semantic chunk

KW - chunk boundary recognition

UR - http://www.scopus.com/inward/record.url?scp=84941056782&partnerID=8YFLogxK

U2 - 10.1109/IALP.2014.6973476

DO - 10.1109/IALP.2014.6973476

M3 - Conference contribution

AN - SCOPUS:84941056782

T3 - Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014

SP - 78

EP - 82

BT - Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014

A2 - Banchs, Rafael E.

A2 - Dong, Minghui

A2 - Lu, Yanfeng

A2 - Ranaivo-Malancon, Bali

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - International Conference on Asian Language Processing 2014, IALP 2014

Y2 - 20 October 2014 through 22 October 2014

ER -

Wang T, Shi S, Huang H, Long C, Li R. Research on recognition of semantic chunk boundary in Tibetan. In Banchs RE, Dong M, Lu Y, Ranaivo-Malancon B, editors, Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 78-82. 6973476. (Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014). doi: 10.1109/IALP.2014.6973476

Research on recognition of semantic chunk boundary in Tibetan

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this