Diagnosing virtualized hadoop performance from benchmark results: An exploratory study

Jun Fan*, Xinhui Li, Chi Harold Liu, Jeffrey Buell, Gavin Lu, Luke Lu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Hadoop is emerging as one of the leading frameworks used by enterprises to help make better business decisions on large data sets. Virtualization technology brings plenty of benefits to Hadoop, including higher resource utilization and cluster reliability. However, these benefits mean nothing to users if unacceptable performance degradation happens from physical to virtual platform. Existing efforts on virtualized Hadoop performance find that improper configurations of network and storage with open sourced virtual deployment cause huge overhead on system performance. However, complexity of hardware and software including virtualization configurations and various scale of deployment also makes performance tuning still too hard a practice to execute. To span that gap of virtualized Hadoop adoption, in this paper, we propose a performance diagnostic methodology that integrates statistical analysis from different layers, and design a heuristic performance diagnostic tool which evaluates the validity and correctness of virtualized Hadoop by analyzing the job traces of popular big data benchmarks. By using this tool, users could quickly identify the bottleneck according to hints provided by this tool, further confirm the diagnosis by referring to performance utilities provided by guest OS and hypervisor, and continue tuning performance for virtualized Hadoop by multiple runs of this tool.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
EditorsPeter Chen, Peter Chen, Hemant Jain
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages578-585
Number of pages8
ISBN (Electronic)9781479950577
DOIs
Publication statusPublished - 22 Sept 2014
Event3rd IEEE International Congress on Big Data, BigData Congress 2014 - Anchorage, United States
Duration: 27 Jun 20142 Jul 2014

Publication series

NameProceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014

Conference

Conference3rd IEEE International Congress on Big Data, BigData Congress 2014
Country/TerritoryUnited States
CityAnchorage
Period27/06/142/07/14

Fingerprint

Dive into the research topics of 'Diagnosing virtualized hadoop performance from benchmark results: An exploratory study'. Together they form a unique fingerprint.

Cite this