Benchmarking Big Data Systems: A Review

Rui Han; Lizy Kurian John; Jianfeng Zhan

doi:10.1109/TSC.2017.2730882

Benchmarking Big Data Systems: A Review

Rui Han^*, Lizy Kurian John, Jianfeng Zhan

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

62 引用（Scopus）

摘要

With the fast development of big data systems in recent years, a variety of open-source benchmarks have been built to evaluate and compare the workloads on these systems, and to promote their technology improvement. However, to date no comprehensive survey has been written on this topic. This paper attempts to fill the void by presenting a review of the state-of-the-art big data benchmarking efforts. The paper first gives an overview of popular open-source benchmarks from the point of view of big data systems. It then reviews the three important aspects of benchmarking-workload generation techniques, workload input data generation techniques, and metrics used to assess systems. For each aspect, the paper divides the surveyed benchmarks into different categories and describes some representative benchmarks, rather than all benchmarks listed, in each category, following the discussion of potential research directions to motivate future work in this area.

源语言	英语
页（从-至）	580-597
页数	18
期刊	IEEE Transactions on Services Computing
卷	11
期	3
DOI	https://doi.org/10.1109/TSC.2017.2730882
出版状态	已出版 - 1 5月 2018
已对外发布	是

访问文件

10.1109/TSC.2017.2730882

其它文件与链接

链接到 Scopus 的出版物

引用此

Han, R., John, L. K., & Zhan, J. (2018). Benchmarking Big Data Systems: A Review. IEEE Transactions on Services Computing, 11(3), 580-597. https://doi.org/10.1109/TSC.2017.2730882

@article{4e9af11a560141ef9001f31c35646344,

title = "Benchmarking Big Data Systems: A Review",

abstract = "With the fast development of big data systems in recent years, a variety of open-source benchmarks have been built to evaluate and compare the workloads on these systems, and to promote their technology improvement. However, to date no comprehensive survey has been written on this topic. This paper attempts to fill the void by presenting a review of the state-of-the-art big data benchmarking efforts. The paper first gives an overview of popular open-source benchmarks from the point of view of big data systems. It then reviews the three important aspects of benchmarking-workload generation techniques, workload input data generation techniques, and metrics used to assess systems. For each aspect, the paper divides the surveyed benchmarks into different categories and describes some representative benchmarks, rather than all benchmarks listed, in each category, following the discussion of potential research directions to motivate future work in this area.",

keywords = "Big data systems, benchmarks, input data, metrics, workloads",

author = "Rui Han and John, {Lizy Kurian} and Jianfeng Zhan",

note = "Publisher Copyright: {\textcopyright} 2008-2012 IEEE.",

year = "2018",

month = may,

day = "1",

doi = "10.1109/TSC.2017.2730882",

language = "English",

volume = "11",

pages = "580--597",

journal = "IEEE Transactions on Services Computing",

issn = "1939-1374",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - Benchmarking Big Data Systems

T2 - A Review

AU - Han, Rui

AU - John, Lizy Kurian

AU - Zhan, Jianfeng

PY - 2018/5/1

Y1 - 2018/5/1

N2 - With the fast development of big data systems in recent years, a variety of open-source benchmarks have been built to evaluate and compare the workloads on these systems, and to promote their technology improvement. However, to date no comprehensive survey has been written on this topic. This paper attempts to fill the void by presenting a review of the state-of-the-art big data benchmarking efforts. The paper first gives an overview of popular open-source benchmarks from the point of view of big data systems. It then reviews the three important aspects of benchmarking-workload generation techniques, workload input data generation techniques, and metrics used to assess systems. For each aspect, the paper divides the surveyed benchmarks into different categories and describes some representative benchmarks, rather than all benchmarks listed, in each category, following the discussion of potential research directions to motivate future work in this area.

AB - With the fast development of big data systems in recent years, a variety of open-source benchmarks have been built to evaluate and compare the workloads on these systems, and to promote their technology improvement. However, to date no comprehensive survey has been written on this topic. This paper attempts to fill the void by presenting a review of the state-of-the-art big data benchmarking efforts. The paper first gives an overview of popular open-source benchmarks from the point of view of big data systems. It then reviews the three important aspects of benchmarking-workload generation techniques, workload input data generation techniques, and metrics used to assess systems. For each aspect, the paper divides the surveyed benchmarks into different categories and describes some representative benchmarks, rather than all benchmarks listed, in each category, following the discussion of potential research directions to motivate future work in this area.

KW - Big data systems

KW - benchmarks

KW - input data

KW - metrics

KW - workloads

UR - http://www.scopus.com/inward/record.url?scp=85028922854&partnerID=8YFLogxK

U2 - 10.1109/TSC.2017.2730882

DO - 10.1109/TSC.2017.2730882

M3 - Article

AN - SCOPUS:85028922854

SN - 1939-1374

VL - 11

SP - 580

EP - 597

JO - IEEE Transactions on Services Computing

JF - IEEE Transactions on Services Computing

IS - 3

ER -

Benchmarking Big Data Systems: A Review

摘要

访问文件

其它文件与链接

指纹

引用此