Benchmarking Big Data Systems: A Review

Rui Han*, Lizy Kurian John, Jianfeng Zhan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

61 Citations (Scopus)

Abstract

With the fast development of big data systems in recent years, a variety of open-source benchmarks have been built to evaluate and compare the workloads on these systems, and to promote their technology improvement. However, to date no comprehensive survey has been written on this topic. This paper attempts to fill the void by presenting a review of the state-of-the-art big data benchmarking efforts. The paper first gives an overview of popular open-source benchmarks from the point of view of big data systems. It then reviews the three important aspects of benchmarking-workload generation techniques, workload input data generation techniques, and metrics used to assess systems. For each aspect, the paper divides the surveyed benchmarks into different categories and describes some representative benchmarks, rather than all benchmarks listed, in each category, following the discussion of potential research directions to motivate future work in this area.

Original languageEnglish
Pages (from-to)580-597
Number of pages18
JournalIEEE Transactions on Services Computing
Volume11
Issue number3
DOIs
Publication statusPublished - 1 May 2018
Externally publishedYes

Keywords

  • Big data systems
  • benchmarks
  • input data
  • metrics
  • workloads

Fingerprint

Dive into the research topics of 'Benchmarking Big Data Systems: A Review'. Together they form a unique fingerprint.

Cite this