TY - GEN
T1 - Diagnosing virtualized hadoop performance from benchmark results
T2 - 3rd IEEE International Congress on Big Data, BigData Congress 2014
AU - Fan, Jun
AU - Li, Xinhui
AU - Liu, Chi Harold
AU - Buell, Jeffrey
AU - Lu, Gavin
AU - Lu, Luke
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/22
Y1 - 2014/9/22
N2 - Hadoop is emerging as one of the leading frameworks used by enterprises to help make better business decisions on large data sets. Virtualization technology brings plenty of benefits to Hadoop, including higher resource utilization and cluster reliability. However, these benefits mean nothing to users if unacceptable performance degradation happens from physical to virtual platform. Existing efforts on virtualized Hadoop performance find that improper configurations of network and storage with open sourced virtual deployment cause huge overhead on system performance. However, complexity of hardware and software including virtualization configurations and various scale of deployment also makes performance tuning still too hard a practice to execute. To span that gap of virtualized Hadoop adoption, in this paper, we propose a performance diagnostic methodology that integrates statistical analysis from different layers, and design a heuristic performance diagnostic tool which evaluates the validity and correctness of virtualized Hadoop by analyzing the job traces of popular big data benchmarks. By using this tool, users could quickly identify the bottleneck according to hints provided by this tool, further confirm the diagnosis by referring to performance utilities provided by guest OS and hypervisor, and continue tuning performance for virtualized Hadoop by multiple runs of this tool.
AB - Hadoop is emerging as one of the leading frameworks used by enterprises to help make better business decisions on large data sets. Virtualization technology brings plenty of benefits to Hadoop, including higher resource utilization and cluster reliability. However, these benefits mean nothing to users if unacceptable performance degradation happens from physical to virtual platform. Existing efforts on virtualized Hadoop performance find that improper configurations of network and storage with open sourced virtual deployment cause huge overhead on system performance. However, complexity of hardware and software including virtualization configurations and various scale of deployment also makes performance tuning still too hard a practice to execute. To span that gap of virtualized Hadoop adoption, in this paper, we propose a performance diagnostic methodology that integrates statistical analysis from different layers, and design a heuristic performance diagnostic tool which evaluates the validity and correctness of virtualized Hadoop by analyzing the job traces of popular big data benchmarks. By using this tool, users could quickly identify the bottleneck according to hints provided by this tool, further confirm the diagnosis by referring to performance utilities provided by guest OS and hypervisor, and continue tuning performance for virtualized Hadoop by multiple runs of this tool.
UR - http://www.scopus.com/inward/record.url?scp=84923860933&partnerID=8YFLogxK
U2 - 10.1109/BigData.Congress.2014.89
DO - 10.1109/BigData.Congress.2014.89
M3 - Conference contribution
AN - SCOPUS:84923860933
T3 - Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
SP - 578
EP - 585
BT - Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
A2 - Chen, Peter
A2 - Chen, Peter
A2 - Jain, Hemant
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 June 2014 through 2 July 2014
ER -