Classifying commas for patent machine translation

Hongzheng Li*, Yun Zhu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.

源语言英语
主期刊名Machine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers
编辑Shujie Liu, Muyun Yang
出版商Springer Verlag
91-101
页数11
ISBN(印刷版)9789811036347
DOI
出版状态已出版 - 2016
已对外发布
活动12th China Workshop on Machine Translation, CWMT 2016 - Urumqi, 中国
期限: 25 8月 201626 8月 2016

出版系列

姓名Communications in Computer and Information Science
668
ISSN(印刷版)1865-0929

会议

会议12th China Workshop on Machine Translation, CWMT 2016
国家/地区中国
Urumqi
时期25/08/1626/08/16

指纹

探究 'Classifying commas for patent machine translation' 的科研主题。它们共同构成独一无二的指纹。

引用此