Skip to main navigation Skip to search Skip to main content

A hybrid sentence splitting method by comma insertion for machine translation with CRF

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

When writing formal articles many English writers often use long sentences with few punctuation marks. Since long sentences bring difficulty to machine translation systems, many researchers try to split them using punctuation marks before translation. But dealing with sentences with few punctuation marks is still intractable. In this paper we use a log linear model to insert commas into proper positions to split long sentence, trying to shorten the length of sub-sentence and benefit to machine translation. Experiment results show that our method can reasonably segment long sentences, and improve the quality of machine translation.

Original languageEnglish
Title of host publicationChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 14th China National Conference, CCL 2015 and 3rd International Symposium, NLP-NABD 2015, Proceedings
EditorsMaosong Sun, Zhiyuan Liu, Yang Liu, Min Zhang
PublisherSpringer Verlag
Pages141-152
Number of pages12
ISBN (Print)9783319258157
DOIs
Publication statusPublished - 2015
Event14th China National Conference on Chinese Computational Linguistics, CCL 2015 and 3rd International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2015 - Guangzhou, China
Duration: 13 Nov 201514 Nov 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9427
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th China National Conference on Chinese Computational Linguistics, CCL 2015 and 3rd International Symposium on Natural Language Processing Based on Naturally Annotated Big Data, NLP-NABD 2015
Country/TerritoryChina
CityGuangzhou
Period13/11/1514/11/15

Fingerprint

Dive into the research topics of 'A hybrid sentence splitting method by comma insertion for machine translation with CRF'. Together they form a unique fingerprint.

Cite this