A hybrid sentence splitting method by comma insertion for machine translation with CRF

Shuli Yang, Chong Feng*, Heyan Huang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

When writing formal articles many English writers often use long sentences with few punctuation marks. Since long sentences bring difficulty to machine translation systems, many researchers try to split them using punctuation marks before translation. But dealing with sentences with few punctuation marks is still intractable. In this paper we use a log linear model to insert commas into proper positions to split long sentence, trying to shorten the length of sub-sentence and benefit to machine translation. Experiment results show that our method can reasonably segment long sentences, and improve the quality of machine translation.

源语言英语
主期刊名Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 14th China National Conference, CCL 2015 and 3rd International Symposium, NLP-NABD 2015, Proceedings
编辑Maosong Sun, Zhiyuan Liu, Yang Liu, Min Zhang
出版商Springer Verlag
141-152
页数12
ISBN(印刷版)9783319258157
DOI
出版状态已出版 - 2015
活动14th China National Conference on Chinese Computational Linguistics, CCL 2015 and 3rd International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2015 - Guangzhou, 中国
期限: 13 11月 201514 11月 2015

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9427
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议14th China National Conference on Chinese Computational Linguistics, CCL 2015 and 3rd International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2015
国家/地区中国
Guangzhou
时期13/11/1514/11/15

指纹

探究 'A hybrid sentence splitting method by comma insertion for machine translation with CRF' 的科研主题。它们共同构成独一无二的指纹。

引用此