TY - GEN
T1 - Lit
T2 - 15th IEEE International Conference on Cluster Computing, CLUSTER 2013
AU - Zhai, Yanlong
AU - Mbarushimana, Emmanuel
AU - Li, Wei
AU - Zhang, Jing
AU - Guo, Ying
PY - 2013
Y1 - 2013
N2 - Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is considered as the most efficient distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation-intensive features. Current big data processing techniques like Hadoop are not designed for computation-intensive applications, thus have insufficient computation power. In this paper, we presented Lit, a high performance massive data computing framework based on CPU/GPU cluster. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. Since the architecture and programming model of GPU is different from CPU, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hided the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop.
AB - Big data processing is receiving significant amount of interest as an important technology to reveal the information behind the data, such as trends, characteristics, etc. MapReduce is considered as the most efficient distributed parallel data processing framework. However, some high-end applications, especially some scientific analyses have both data-intensive and computation-intensive features. Current big data processing techniques like Hadoop are not designed for computation-intensive applications, thus have insufficient computation power. In this paper, we presented Lit, a high performance massive data computing framework based on CPU/GPU cluster. Lit integrated GPU with Hadoop to improve the computational power of each node in the cluster. Since the architecture and programming model of GPU is different from CPU, Lit provided an annotation based approach to automatically generate CUDA codes from Hadoop codes. Lit hided the complexity of programming on CPU/GPU cluster by providing extended compiler and optimizer. To utilize the simplified programming, scalability and fault tolerance benefits of Hadoop and combine them with the high performance computation power of GPU, Lit extended the Hadoop by applying a GPUClassloader to detect the GPU, generate and compile CUDA codes, and invoke the shared library. Our experimental results show that Lit can achieve an average speedup of 1x to 3x on three typical applications over Hadoop.
UR - http://www.scopus.com/inward/record.url?scp=84893621114&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2013.6702614
DO - 10.1109/CLUSTER.2013.6702614
M3 - Conference contribution
AN - SCOPUS:84893621114
SN - 9781479908981
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
BT - 2013 IEEE International Conference on Cluster Computing, CLUSTER 2013
Y2 - 23 September 2013 through 27 September 2013
ER -