Pretreatment for speech machine translation

Xiaofei Zhang; Chong Feng; Heyan Huang

doi:10.1007/978-3-642-16732-4_13

Pretreatment for speech machine translation

Xiaofei Zhang^*, Chong Feng, Heyan Huang

^*此作品的通讯作者

计算机学院

Chinese Academy of Sciences

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.

源语言	英语
主期刊名	Computational Collective Intelligence
主期刊副标题	Technologies and Applications - Second International Conference, ICCCI 2010, Proceedings
页	113-121
页数	9
版本	PART 2
DOI	https://doi.org/10.1007/978-3-642-16732-4_13
出版状态	已出版 - 2010
活动	2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010 - Kaohsiung, 中国台湾期限: 10 11月 2010 → 12 11月 2010

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
编号	PART 2
卷	6422 LNAI
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010
国家/地区	中国台湾
市	Kaohsiung
时期	10/11/10 → 12/11/10

访问文件

10.1007/978-3-642-16732-4_13

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, X., Feng, C., & Huang, H. (2010). Pretreatment for speech machine translation. 在 Computational Collective Intelligence: Technologies and Applications - Second International Conference, ICCCI 2010, Proceedings (PART 2 编辑, 页码 113-121). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 6422 LNAI, 号码 PART 2). https://doi.org/10.1007/978-3-642-16732-4_13

Zhang, Xiaofei ; Feng, Chong ; Huang, Heyan. / Pretreatment for speech machine translation. Computational Collective Intelligence: Technologies and Applications - Second International Conference, ICCCI 2010, Proceedings. PART 2. 编辑 2010. 页码 113-121 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).

@inproceedings{ef6d08e5d60c499fb0a685a6108e33d8,

title = "Pretreatment for speech machine translation",

abstract = "There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.",

keywords = "Automatic speech recognition, Conditional random field model, Pretreatment, Speech machine translation",

author = "Xiaofei Zhang and Chong Feng and Heyan Huang",

year = "2010",

doi = "10.1007/978-3-642-16732-4_13",

language = "English",

isbn = "3642167314",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

number = "PART 2",

pages = "113--121",

booktitle = "Computational Collective Intelligence",

edition = "PART 2",

note = "2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010 ; Conference date: 10-11-2010 Through 12-11-2010",

}

Zhang, X, Feng, C & Huang, H 2010, Pretreatment for speech machine translation. 在 Computational Collective Intelligence: Technologies and Applications - Second International Conference, ICCCI 2010, Proceedings. PART 2 编辑, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 号码 PART 2, 卷 6422 LNAI, 页码 113-121, 2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010, Kaohsiung, 中国台湾, 10/11/10. https://doi.org/10.1007/978-3-642-16732-4_13

Pretreatment for speech machine translation. / Zhang, Xiaofei; Feng, Chong; Huang, Heyan.
Computational Collective Intelligence: Technologies and Applications - Second International Conference, ICCCI 2010, Proceedings. PART 2. 编辑 2010. 页码 113-121 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 6422 LNAI, 号码 PART 2).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Pretreatment for speech machine translation

AU - Zhang, Xiaofei

AU - Feng, Chong

AU - Huang, Heyan

PY - 2010

Y1 - 2010

N2 - There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.

AB - There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.

KW - Automatic speech recognition

KW - Conditional random field model

KW - Pretreatment

KW - Speech machine translation

UR - http://www.scopus.com/inward/record.url?scp=78649621878&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-16732-4_13

DO - 10.1007/978-3-642-16732-4_13

M3 - Conference contribution

AN - SCOPUS:78649621878

SN - 3642167314

SN - 9783642167317

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 113

EP - 121

BT - Computational Collective Intelligence

T2 - 2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010

Y2 - 10 November 2010 through 12 November 2010

ER -

Zhang X, Feng C, Huang H. Pretreatment for speech machine translation. 在 Computational Collective Intelligence: Technologies and Applications - Second International Conference, ICCCI 2010, Proceedings. PART 2 编辑 2010. 页码 113-121. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). doi: 10.1007/978-3-642-16732-4_13

Pretreatment for speech machine translation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此