Classifying commas for patent machine translation

Hongzheng Li*, Yun Zhu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.

Original languageEnglish
Title of host publicationMachine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers
EditorsShujie Liu, Muyun Yang
PublisherSpringer Verlag
Pages91-101
Number of pages11
ISBN (Print)9789811036347
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event12th China Workshop on Machine Translation, CWMT 2016 - Urumqi, China
Duration: 25 Aug 201626 Aug 2016

Publication series

NameCommunications in Computer and Information Science
Volume668
ISSN (Print)1865-0929

Conference

Conference12th China Workshop on Machine Translation, CWMT 2016
Country/TerritoryChina
CityUrumqi
Period25/08/1626/08/16

Keywords

  • Comma
  • Machine learning
  • Patent machine translation
  • Rule

Fingerprint

Dive into the research topics of 'Classifying commas for patent machine translation'. Together they form a unique fingerprint.

Cite this