TY - JOUR
T1 - ComMapReduce
T2 - An improvement of MapReduce with lightweight communication mechanisms
AU - Ding, Linlin
AU - Wang, Guoren
AU - Xin, Junchang
AU - Wang, Xiaoyang
AU - Huang, Shan
AU - Zhang, Rui
PY - 2013/11
Y1 - 2013/11
N2 - As a parallel programming framework, MapReduce can process scalable and parallel applications with large scale datasets. The executions of Mappers and Reducers are independent of each other. There is no communication among Mappers, neither among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data. We observe that this waste can be significantly reduced by simple communication mechanisms to enhance the performance of MapReduce. In this paper, we propose ComMapReduce, an efficient framework that extends and improves MapReduce for big data applications in the cloud. ComMapReduce can effectively obtain certain shared information with efficient lightweight communication mechanisms. Three basic communication strategies, Lazy, Eager and Hybrid, and two optimization communication strategies, Prepositive and Postpositive, are proposed to obtain the shared information and effectively process big data applications. We also illustrate the implementations of three typical applications with large scale datasets on ComMapReduce. Our extensive experiments demonstrate that ComMapReduce outperforms MapReduce in all metrics without affecting the existing characteristics of MapReduce.
AB - As a parallel programming framework, MapReduce can process scalable and parallel applications with large scale datasets. The executions of Mappers and Reducers are independent of each other. There is no communication among Mappers, neither among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data. We observe that this waste can be significantly reduced by simple communication mechanisms to enhance the performance of MapReduce. In this paper, we propose ComMapReduce, an efficient framework that extends and improves MapReduce for big data applications in the cloud. ComMapReduce can effectively obtain certain shared information with efficient lightweight communication mechanisms. Three basic communication strategies, Lazy, Eager and Hybrid, and two optimization communication strategies, Prepositive and Postpositive, are proposed to obtain the shared information and effectively process big data applications. We also illustrate the implementations of three typical applications with large scale datasets on ComMapReduce. Our extensive experiments demonstrate that ComMapReduce outperforms MapReduce in all metrics without affecting the existing characteristics of MapReduce.
KW - Communication mechanism
KW - Hadoop
KW - MapReduce
UR - http://www.scopus.com/inward/record.url?scp=84889089554&partnerID=8YFLogxK
U2 - 10.1016/j.datak.2013.04.004
DO - 10.1016/j.datak.2013.04.004
M3 - Article
AN - SCOPUS:84889089554
SN - 0169-023X
VL - 88
SP - 224
EP - 247
JO - Data and Knowledge Engineering
JF - Data and Knowledge Engineering
ER -