Training set similarity based parameter selection for statistical machine translation

Xuewen Shi, Heyan Huang, Ping Jian*, Yi Kun Tang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Log-linear model based statistical machine translation systems (SMT) are usually composed of multiple feature functions. Each feature function is assigned a weight as a model parameter. In this paper, we consider that different input source sentences may have discrepant needs for model parameters. To adapt the model to different inputs, we propose a model parameters selection method for log-linear model based SMT systems. The method is mainly based on the characteristics of different feature functions themselves without any assumption on unseen test sets. Experimental results on two language pairs (Zh-En and Ug-Zh) show that our method leads to the improvements up to 2.4 and 2.2 BLEU score respectively, and it also shows the good interpretability of our proposed method.

Original languageEnglish
Title of host publicationWeb and Big Data - Second International Joint Conference, APWeb-WAIM 2018, Proceedings
EditorsJianliang Xu, Yoshiharu Ishikawa, Yi Cai
PublisherSpringer Verlag
Pages63-71
Number of pages9
ISBN (Print)9783319968896
DOIs
Publication statusPublished - 2018
Event2nd Asia Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data, APWeb-WAIM 2018 - Macau, China
Duration: 23 Jul 201825 Jul 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10987 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd Asia Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data, APWeb-WAIM 2018
Country/TerritoryChina
CityMacau
Period23/07/1825/07/18

Keywords

  • Log-linear model
  • Parameter selection
  • Statistical machine translation

Fingerprint

Dive into the research topics of 'Training set similarity based parameter selection for statistical machine translation'. Together they form a unique fingerprint.

Cite this