Classifying commas for patent machine translation

Hongzheng Li; Yun Zhu

doi:10.1007/978-981-10-3635-4_8

Classifying commas for patent machine translation

Hongzheng Li^*, Yun Zhu

^*Corresponding author for this work

Beijing Normal University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.

Original language	English
Title of host publication	Machine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers
Editors	Shujie Liu, Muyun Yang
Publisher	Springer Verlag
Pages	91-101
Number of pages	11
ISBN (Print)	9789811036347
DOIs	https://doi.org/10.1007/978-981-10-3635-4_8
Publication status	Published - 2016
Externally published	Yes
Event	12th China Workshop on Machine Translation, CWMT 2016 - Urumqi, China Duration: 25 Aug 2016 → 26 Aug 2016

Publication series

Name	Communications in Computer and Information Science
Volume	668
ISSN (Print)	1865-0929

Conference

Conference	12th China Workshop on Machine Translation, CWMT 2016
Country/Territory	China
City	Urumqi
Period	25/08/16 → 26/08/16

Keywords

Comma
Machine learning
Patent machine translation
Rule

Access to Document

10.1007/978-981-10-3635-4_8

Cite this

Li, H., & Zhu, Y. (2016). Classifying commas for patent machine translation. In S. Liu, & M. Yang (Eds.), Machine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers (pp. 91-101). (Communications in Computer and Information Science; Vol. 668). Springer Verlag. https://doi.org/10.1007/978-981-10-3635-4_8

@inproceedings{1038f423d6744db5b175726624f5e118,

title = "Classifying commas for patent machine translation",

abstract = "Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.",

keywords = "Comma, Machine learning, Patent machine translation, Rule",

author = "Hongzheng Li and Yun Zhu",

note = "Publisher Copyright: {\textcopyright} Springer Nature Singapore Pte Ltd. 2016.; 12th China Workshop on Machine Translation, CWMT 2016 ; Conference date: 25-08-2016 Through 26-08-2016",

year = "2016",

doi = "10.1007/978-981-10-3635-4_8",

language = "English",

isbn = "9789811036347",

series = "Communications in Computer and Information Science",

publisher = "Springer Verlag",

pages = "91--101",

editor = "Shujie Liu and Muyun Yang",

booktitle = "Machine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers",

address = "Germany",

}

Li, H & Zhu, Y 2016, Classifying commas for patent machine translation. in S Liu & M Yang (eds), Machine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers. Communications in Computer and Information Science, vol. 668, Springer Verlag, pp. 91-101, 12th China Workshop on Machine Translation, CWMT 2016, Urumqi, China, 25/08/16. https://doi.org/10.1007/978-981-10-3635-4_8

Classifying commas for patent machine translation. / Li, Hongzheng; Zhu, Yun.
Machine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers. ed. / Shujie Liu; Muyun Yang. Springer Verlag, 2016. p. 91-101 (Communications in Computer and Information Science; Vol. 668).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Classifying commas for patent machine translation

AU - Li, Hongzheng

AU - Zhu, Yun

N1 - Publisher Copyright: © Springer Nature Singapore Pte Ltd. 2016.

PY - 2016

Y1 - 2016

N2 - Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.

AB - Commas are widely distributed and used in Chinese and play important role in detecting boundary of basic units in sentences and discourses. Towards Chinese-English patent machine translation, this paper presents two methods using rich linguistic information to identify commas which separate sub-sentences and non-sub-sentences. The first method employs word knowledge base and formal rules to determine roles of commas, while the second one uses machine learning approaches. The experimental results show that overall F1 scores of rule-based method are higher than 93%, indicating the approach performs well in classifying commas. On the other hand, the classifiers show some differences. We also draw the conclusion that identifying commas is actually able to improve the quality of translation outputs.

KW - Comma

KW - Machine learning

KW - Patent machine translation

KW - Rule

UR - http://www.scopus.com/inward/record.url?scp=85010190264&partnerID=8YFLogxK

U2 - 10.1007/978-981-10-3635-4_8

DO - 10.1007/978-981-10-3635-4_8

M3 - Conference contribution

AN - SCOPUS:85010190264

SN - 9789811036347

T3 - Communications in Computer and Information Science

SP - 91

EP - 101

BT - Machine Translation - 12th China Workshop, CWMT 2016, Revised Selected Papers

A2 - Liu, Shujie

A2 - Yang, Muyun

PB - Springer Verlag

T2 - 12th China Workshop on Machine Translation, CWMT 2016

Y2 - 25 August 2016 through 26 August 2016

ER -

Classifying commas for patent machine translation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this