Abstract
With the fast development of big data systems in recent years, a variety of open-source benchmarks have been built to evaluate and compare the workloads on these systems, and to promote their technology improvement. However, to date no comprehensive survey has been written on this topic. This paper attempts to fill the void by presenting a review of the state-of-the-art big data benchmarking efforts. The paper first gives an overview of popular open-source benchmarks from the point of view of big data systems. It then reviews the three important aspects of benchmarking-workload generation techniques, workload input data generation techniques, and metrics used to assess systems. For each aspect, the paper divides the surveyed benchmarks into different categories and describes some representative benchmarks, rather than all benchmarks listed, in each category, following the discussion of potential research directions to motivate future work in this area.
Original language | English |
---|---|
Pages (from-to) | 580-597 |
Number of pages | 18 |
Journal | IEEE Transactions on Services Computing |
Volume | 11 |
Issue number | 3 |
DOIs | |
Publication status | Published - 1 May 2018 |
Externally published | Yes |
Keywords
- Big data systems
- benchmarks
- input data
- metrics
- workloads