跳到主要导航 跳到搜索 跳到主要内容

The discovery of natural typing annotations: User-produced potential Chinese word delimiters

  • Dakui Zhang
  • , Yu Mao
  • , Yang Liu
  • , Hanshi Wang
  • , Chuyuan Wei
  • , Shiping Tang

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Human labeled corpus is indispensable for the training of supervised word segmenters. However, it is time-consuming and laborintensive to label corpus manually. During the process of typing Chinese text by Pingyin, people usually need to type "space" or numeric keys to choose the words due to homophones, which can be viewed as a cue for segmentation. We argue that such a process can be used to build a labeled corpus in a more natural way. Thus, in this paper, we investigate Natural Typing Annotations (NTAs) that are potential word delimiters produced by users while typing Chinese. A detailed analysis on over three hundred user-produced texts containing NTAs reveals that highquality NTAs mostly agree with gold segmentation and, consequently, can be used for improving the performance of supervised word segmentation model in out-of-domain. Experiments show that a classification model combined with a voting mechanism can reliably identify the high-quality NTAs texts that are more readily available labeled corpus. Furthermore, the NTAs might be particularly useful to deal with out-of-vocabulary (OOV) words such as proper names and neo-logisms.

源语言英语
主期刊名ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
出版商Association for Computational Linguistics (ACL)
662-667
页数6
ISBN(电子版)9781941643730
DOI
出版状态已出版 - 2015
活动53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015 - Beijing, 中国
期限: 26 7月 201531 7月 2015

出版系列

姓名ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference
2

会议

会议53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015
国家/地区中国
Beijing
时期26/07/1531/07/15

指纹

探究 'The discovery of natural typing annotations: User-produced potential Chinese word delimiters' 的科研主题。它们共同构成独一无二的指纹。

引用此