Which performs better for new word detection, character based or Chinese Word Segmentation based?

Haijun Zhang, Shumin Shi

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 2
  • Captures
    • Readers: 2
see details

摘要

This paper proposed a novel method to evaluate the performance of New Word Detection (NWD) based on repeats extraction. For small-scale corpus, we put forward employing Conditional Random Field (CRF) as statistical framework to estimate the effects of different strategies of NWD. For the situations of large-scale corpus, as there is no infinity of annotated corpus, comparative experiments are unable to carry out evaluation. Accordingly, this paper proposed a pragmatic quantitative model to analyze and estimate the performance of NWD for all kinds of cases, especially for large-scale corpus situation. Studies have shown there is a good mutual authentication between experimental results and conclusion from the quantitative model. On the basis of analysis for experimental data and quantitative model, a reliable conclusion for effects of Chinese NWD basing the two strategies is reached, which can give a certain instruction for follow-up studies in Chinese new word detection.

源语言英语
主期刊名Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014
编辑Rafael E. Banchs, Minghui Dong, Yanfeng Lu, Bali Ranaivo-Malancon
出版商Institute of Electrical and Electronics Engineers Inc.
10-14
页数5
ISBN(电子版)9781479953301
DOI
出版状态已出版 - 3 12月 2014
活动International Conference on Asian Language Processing 2014, IALP 2014 - Kuching, 马来西亚
期限: 20 10月 201422 10月 2014

出版系列

姓名Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014

会议

会议International Conference on Asian Language Processing 2014, IALP 2014
国家/地区马来西亚
Kuching
时期20/10/1422/10/14

指纹

探究 'Which performs better for new word detection, character based or Chinese Word Segmentation based?' 的科研主题。它们共同构成独一无二的指纹。

引用此

Zhang, H., & Shi, S. (2014). Which performs better for new word detection, character based or Chinese Word Segmentation based? 在 R. E. Banchs, M. Dong, Y. Lu, & B. Ranaivo-Malancon (编辑), Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014 (页码 10-14). 文章 6973474 (Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IALP.2014.6973474