Lit: A high performance massive data computing framework based on CPU/GPU cluster

Yanlong Zhai; Emmanuel Mbarushimana; Wei Li; Jing Zhang; Ying Guo

doi:10.1109/CLUSTER.2013.6702614

Lit: A high performance massive data computing framework based on CPU/GPU cluster

Yanlong Zhai, Emmanuel Mbarushimana, Wei Li, Jing Zhang, Ying Guo

网络空间安全学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

5 引用（Scopus）

摘要

Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is considered as the most efficient distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation-intensive features. Current big data processing techniques like Hadoop are not designed for computation-intensive applications, thus have insufficient computation power. In this paper, we presented Lit, a high performance massive data computing framework based on CPU/GPU cluster. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. Since the architecture and programming model of GPU is different from CPU, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hided the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop.

源语言	英语
主期刊名	2013 IEEE International Conference on Cluster Computing, CLUSTER 2013
DOI	https://doi.org/10.1109/CLUSTER.2013.6702614
出版状态	已出版 - 2013
活动	15th IEEE International Conference on Cluster Computing, CLUSTER 2013 - Indianapolis, IN, 美国期限: 23 9月 2013 → 27 9月 2013

出版系列

姓名	Proceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN（印刷版）	1552-5244

会议

会议	15th IEEE International Conference on Cluster Computing, CLUSTER 2013
国家/地区	美国
市	Indianapolis, IN
时期	23/09/13 → 27/09/13

访问文件

10.1109/CLUSTER.2013.6702614

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{86c1db03529246e397598168bc475342,

title = "Lit: A high performance massive data computing framework based on CPU/GPU cluster",

abstract = "Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is considered as the most efficient distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation-intensive features. Current big data processing techniques like Hadoop are not designed for computation-intensive applications, thus have insufficient computation power. In this paper, we presented Lit, a high performance massive data computing framework based on CPU/GPU cluster. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. Since the architecture and programming model of GPU is different from CPU, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hided the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop.",

author = "Yanlong Zhai and Emmanuel Mbarushimana and Wei Li and Jing Zhang and Ying Guo",

year = "2013",

doi = "10.1109/CLUSTER.2013.6702614",

language = "English",

isbn = "9781479908981",

series = "Proceedings - IEEE International Conference on Cluster Computing, ICCC",

booktitle = "2013 IEEE International Conference on Cluster Computing, CLUSTER 2013",

note = "15th IEEE International Conference on Cluster Computing, CLUSTER 2013 ; Conference date: 23-09-2013 Through 27-09-2013",

}

Zhai, Y, Mbarushimana, E, Li, W, Zhang, J & Guo, Y 2013, Lit: A high performance massive data computing framework based on CPU/GPU cluster. 在 2013 IEEE International Conference on Cluster Computing, CLUSTER 2013., 6702614, Proceedings - IEEE International Conference on Cluster Computing, ICCC, 15th IEEE International Conference on Cluster Computing, CLUSTER 2013, Indianapolis, IN, 美国, 23/09/13. https://doi.org/10.1109/CLUSTER.2013.6702614

Lit: A high performance massive data computing framework based on CPU/GPU cluster. / Zhai, Yanlong; Mbarushimana, Emmanuel; Li, Wei 等.
2013 IEEE International Conference on Cluster Computing, CLUSTER 2013. 2013. 6702614 (Proceedings - IEEE International Conference on Cluster Computing, ICCC).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Lit

T2 - 15th IEEE International Conference on Cluster Computing, CLUSTER 2013

AU - Zhai, Yanlong

AU - Mbarushimana, Emmanuel

AU - Li, Wei

AU - Zhang, Jing

AU - Guo, Ying

PY - 2013

Y1 - 2013

N2 - Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is considered as the most efficient distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation-intensive features. Current big data processing techniques like Hadoop are not designed for computation-intensive applications, thus have insufficient computation power. In this paper, we presented Lit, a high performance massive data computing framework based on CPU/GPU cluster. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. Since the architecture and programming model of GPU is different from CPU, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hided the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop.

AB - Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is considered as the most efficient distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation-intensive features. Current big data processing techniques like Hadoop are not designed for computation-intensive applications, thus have insufficient computation power. In this paper, we presented Lit, a high performance massive data computing framework based on CPU/GPU cluster. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. Since the architecture and programming model of GPU is different from CPU, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hided the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop.

UR - http://www.scopus.com/inward/record.url?scp=84893621114&partnerID=8YFLogxK

U2 - 10.1109/CLUSTER.2013.6702614

DO - 10.1109/CLUSTER.2013.6702614

M3 - Conference contribution

AN - SCOPUS:84893621114

SN - 9781479908981

T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC

BT - 2013 IEEE International Conference on Cluster Computing, CLUSTER 2013

Y2 - 23 September 2013 through 27 September 2013

ER -

Lit: A high performance massive data computing framework based on CPU/GPU cluster

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此