Design and optimization of a big data computing framework based on CPU/GPU cluster

Yanlong Zhai; Ying Guo; Qiurui Chen; Kai Yang; Emmanuel Mbarushimana

doi:10.1109/HPCC.and.EUC.2013.147

Design and optimization of a big data computing framework based on CPU/GPU cluster

Yanlong Zhai, Ying Guo, Qiurui Chen, Kai Yang, Emmanuel Mbarushimana

School of Cyberspace Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Citations (Scopus)

Abstract

Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is one of the most popular distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation intensive features. Therefore, we have designed and implemented a high performance big data process framework called Lit, which leverages the power of Hadoop and GPUs. In this paper, we presented the basic design and architecture of Lit. More importantly, we spent a lot of effort on optimizing the communications between CPU and GPU. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. To simplify the parallel programming, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hid the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. For all CPU-GPU co-processing systems, the communication with the GPU is the well-known performance bottleneck. We introduced data flow optimization approach to reduce unnecessary memory copies. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop, and the data flow optimization approach for the Lit can achieve about 16% performance gain.

Original language	English
Title of host publication	Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013
Publisher	IEEE Computer Society
Pages	1039-1046
Number of pages	8
ISBN (Print)	9780769550886
DOIs	https://doi.org/10.1109/HPCC.and.EUC.2013.147
Publication status	Published - 2014
Event	15th IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2013 - Zhangjiajie, Hunan, China Duration: 13 Nov 2013 → 15 Nov 2013

Publication series

Name	Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013

Conference

Conference	15th IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2013
Country/Territory	China
City	Zhangjiajie, Hunan
Period	13/11/13 → 15/11/13

Access to Document

10.1109/HPCC.and.EUC.2013.147

Cite this

Zhai, Y., Guo, Y., Chen, Q., Yang, K., & Mbarushimana, E. (2014). Design and optimization of a big data computing framework based on CPU/GPU cluster. In Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013 (pp. 1039-1046). Article 6832029 (Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013). IEEE Computer Society. https://doi.org/10.1109/HPCC.and.EUC.2013.147

Zhai, Yanlong ; Guo, Ying ; Chen, Qiurui et al. / Design and optimization of a big data computing framework based on CPU/GPU cluster. Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013. IEEE Computer Society, 2014. pp. 1039-1046 (Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013).

@inproceedings{ec427a56a6c24f968f13361c940660ed,

title = "Design and optimization of a big data computing framework based on CPU/GPU cluster",

abstract = "Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is one of the most popular distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation intensive features. Therefore, we have designed and implemented a high performance big data process framework called Lit, which leverages the power of Hadoop and GPUs. In this paper, we presented the basic design and architecture of Lit. More importantly, we spent a lot of effort on optimizing the communications between CPU and GPU. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. To simplify the parallel programming, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hid the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. For all CPU-GPU co-processing systems, the communication with the GPU is the well-known performance bottleneck. We introduced data flow optimization approach to reduce unnecessary memory copies. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop, and the data flow optimization approach for the Lit can achieve about 16% performance gain.",

author = "Yanlong Zhai and Ying Guo and Qiurui Chen and Kai Yang and Emmanuel Mbarushimana",

year = "2014",

doi = "10.1109/HPCC.and.EUC.2013.147",

language = "English",

isbn = "9780769550886",

series = "Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013",

publisher = "IEEE Computer Society",

pages = "1039--1046",

booktitle = "Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013",

address = "United States",

note = "15th IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2013 ; Conference date: 13-11-2013 Through 15-11-2013",

}

Zhai, Y, Guo, Y, Chen, Q, Yang, K & Mbarushimana, E 2014, Design and optimization of a big data computing framework based on CPU/GPU cluster. in Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013., 6832029, Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013, IEEE Computer Society, pp. 1039-1046, 15th IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2013, Zhangjiajie, Hunan, China, 13/11/13. https://doi.org/10.1109/HPCC.and.EUC.2013.147

Design and optimization of a big data computing framework based on CPU/GPU cluster. / Zhai, Yanlong; Guo, Ying; Chen, Qiurui et al.
Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013. IEEE Computer Society, 2014. p. 1039-1046 6832029 (Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Design and optimization of a big data computing framework based on CPU/GPU cluster

AU - Zhai, Yanlong

AU - Guo, Ying

AU - Chen, Qiurui

AU - Yang, Kai

AU - Mbarushimana, Emmanuel

PY - 2014

Y1 - 2014

N2 - Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is one of the most popular distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation intensive features. Therefore, we have designed and implemented a high performance big data process framework called Lit, which leverages the power of Hadoop and GPUs. In this paper, we presented the basic design and architecture of Lit. More importantly, we spent a lot of effort on optimizing the communications between CPU and GPU. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. To simplify the parallel programming, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hid the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. For all CPU-GPU co-processing systems, the communication with the GPU is the well-known performance bottleneck. We introduced data flow optimization approach to reduce unnecessary memory copies. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop, and the data flow optimization approach for the Lit can achieve about 16% performance gain.

AB - Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is one of the most popular distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation intensive features. Therefore, we have designed and implemented a high performance big data process framework called Lit, which leverages the power of Hadoop and GPUs. In this paper, we presented the basic design and architecture of Lit. More importantly, we spent a lot of effort on optimizing the communications between CPU and GPU. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. To simplify the parallel programming, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hid the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. For all CPU-GPU co-processing systems, the communication with the GPU is the well-known performance bottleneck. We introduced data flow optimization approach to reduce unnecessary memory copies. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop, and the data flow optimization approach for the Lit can achieve about 16% performance gain.

UR - http://www.scopus.com/inward/record.url?scp=84903946827&partnerID=8YFLogxK

U2 - 10.1109/HPCC.and.EUC.2013.147

DO - 10.1109/HPCC.and.EUC.2013.147

M3 - Conference contribution

AN - SCOPUS:84903946827

SN - 9780769550886

T3 - Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013

SP - 1039

EP - 1046

BT - Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013

PB - IEEE Computer Society

T2 - 15th IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2013

Y2 - 13 November 2013 through 15 November 2013

ER -

Zhai Y, Guo Y, Chen Q, Yang K, Mbarushimana E. Design and optimization of a big data computing framework based on CPU/GPU cluster. In Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013. IEEE Computer Society. 2014. p. 1039-1046. 6832029. (Proceedings - 2013 IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2013). doi: 10.1109/HPCC.and.EUC.2013.147

Design and optimization of a big data computing framework based on CPU/GPU cluster

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this