Patent literatures translation system based on hadoop

Di Zhang; Heyan Huang; Yonggang Huang

doi:10.1007/978-3-642-55038-6_20

Patent literatures translation system based on hadoop

Di Zhang, Heyan Huang^*, Yonggang Huang

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

In order to tackle the slow response caused by massive patent literatures, a patent literatures translation system based on Hadoop is proposed in this paper. The paper presents a hybrid storage structure and a parallel translation model for massive patent literatures. The hierarchical storage structure is based on HDFS (Hadoop Distributed File System), which stores the patent documents and HBase where directories of such data are stored. This hybrid structure enables faster retrieval through the distributed file system. In translation, The Hadoop MapReduce framework is utilized. The MapReduce computation model not only can translate the patent literatures in highly parallel, but also can process multiple documents simultaneously. The experimental results show that the proposed machine translation system in this paper has better translation performance than the conventional machine translation approach.

源语言	英语
主期刊名	Future Information Technology
出版商	Springer Verlag
页	127-135
页数	9
ISBN（印刷版）	9783642550379
DOI	https://doi.org/10.1007/978-3-642-55038-6_20
出版状态	已出版 - 2014
活动	9th FTRA InternationalConference on Future Information Technology, FutureTech 2014 - Zhangjiajie, 中国期限: 28 5月 2014 → 31 5月 2014

出版系列

姓名	Lecture Notes in Electrical Engineering
卷	309 LNEE
ISSN（印刷版）	1876-1100
ISSN（电子版）	1876-1119

会议

会议	9th FTRA InternationalConference on Future Information Technology, FutureTech 2014
国家/地区	中国
市	Zhangjiajie
时期	28/05/14 → 31/05/14

访问文件

10.1007/978-3-642-55038-6_20

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{0532b7a8f2034111880526f9110817f1,

title = "Patent literatures translation system based on hadoop",

abstract = "In order to tackle the slow response caused by massive patent literatures, a patent literatures translation system based on Hadoop is proposed in this paper. The paper presents a hybrid storage structure and a parallel translation model for massive patent literatures. The hierarchical storage structure is based on HDFS (Hadoop Distributed File System), which stores the patent documents and HBase where directories of such data are stored. This hybrid structure enables faster retrieval through the distributed file system. In translation, The Hadoop MapReduce framework is utilized. The MapReduce computation model not only can translate the patent literatures in highly parallel, but also can process multiple documents simultaneously. The experimental results show that the proposed machine translation system in this paper has better translation performance than the conventional machine translation approach.",

keywords = "HBase, HDFS, Hadoop, MapReduce, machine translation, patent literatures",

author = "Di Zhang and Heyan Huang and Yonggang Huang",

year = "2014",

doi = "10.1007/978-3-642-55038-6_20",

language = "English",

isbn = "9783642550379",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer Verlag",

pages = "127--135",

booktitle = "Future Information Technology",

address = "Germany",

note = "9th FTRA InternationalConference on Future Information Technology, FutureTech 2014 ; Conference date: 28-05-2014 Through 31-05-2014",

}

Zhang, D, Huang, H & Huang, Y 2014, Patent literatures translation system based on hadoop. 在 Future Information Technology. Lecture Notes in Electrical Engineering, 卷 309 LNEE, Springer Verlag, 页码 127-135, 9th FTRA InternationalConference on Future Information Technology, FutureTech 2014, Zhangjiajie, 中国, 28/05/14. https://doi.org/10.1007/978-3-642-55038-6_20

TY - GEN

T1 - Patent literatures translation system based on hadoop

AU - Zhang, Di

AU - Huang, Heyan

AU - Huang, Yonggang

PY - 2014

Y1 - 2014

N2 - In order to tackle the slow response caused by massive patent literatures, a patent literatures translation system based on Hadoop is proposed in this paper. The paper presents a hybrid storage structure and a parallel translation model for massive patent literatures. The hierarchical storage structure is based on HDFS (Hadoop Distributed File System), which stores the patent documents and HBase where directories of such data are stored. This hybrid structure enables faster retrieval through the distributed file system. In translation, The Hadoop MapReduce framework is utilized. The MapReduce computation model not only can translate the patent literatures in highly parallel, but also can process multiple documents simultaneously. The experimental results show that the proposed machine translation system in this paper has better translation performance than the conventional machine translation approach.

AB - In order to tackle the slow response caused by massive patent literatures, a patent literatures translation system based on Hadoop is proposed in this paper. The paper presents a hybrid storage structure and a parallel translation model for massive patent literatures. The hierarchical storage structure is based on HDFS (Hadoop Distributed File System), which stores the patent documents and HBase where directories of such data are stored. This hybrid structure enables faster retrieval through the distributed file system. In translation, The Hadoop MapReduce framework is utilized. The MapReduce computation model not only can translate the patent literatures in highly parallel, but also can process multiple documents simultaneously. The experimental results show that the proposed machine translation system in this paper has better translation performance than the conventional machine translation approach.

KW - HBase

KW - HDFS

KW - Hadoop

KW - MapReduce

KW - machine translation

KW - patent literatures

UR - http://www.scopus.com/inward/record.url?scp=84902360871&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-55038-6_20

DO - 10.1007/978-3-642-55038-6_20

M3 - Conference contribution

AN - SCOPUS:84902360871

SN - 9783642550379

T3 - Lecture Notes in Electrical Engineering

SP - 127

EP - 135

BT - Future Information Technology

PB - Springer Verlag

T2 - 9th FTRA InternationalConference on Future Information Technology, FutureTech 2014

Y2 - 28 May 2014 through 31 May 2014

ER -

Patent literatures translation system based on hadoop

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此