On big data benchmarking

Rui Han; Lu Xiaoyi; Xu jiangtao

doi:10.1007/978-3-319-13021-7_1

On big data benchmarking

Rui Han^*, Lu Xiaoyi, Xu jiangtao

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

16 Citations (Scopus)

Abstract

Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.

Original language	English
Title of host publication	Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers
Editors	Jianfeng Zhan, Rui Han, Rui Han, Chuliang Weng
Publisher	Springer Verlag
Pages	3-18
Number of pages	16
ISBN (Electronic)	9783319130200
DOIs	https://doi.org/10.1007/978-3-319-13021-7_1
Publication status	Published - 2014
Externally published	Yes

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	8807
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Keywords

Benchmark
Big data systems
Data
Tests

Access to Document

10.1007/978-3-319-13021-7_1

Cite this

Han, R., Xiaoyi, L., & jiangtao, X. (2014). On big data benchmarking. In J. Zhan, R. Han, R. Han, & C. Weng (Eds.), Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers (pp. 3-18). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8807). Springer Verlag. https://doi.org/10.1007/978-3-319-13021-7_1

Han, Rui ; Xiaoyi, Lu ; jiangtao, Xu. / On big data benchmarking. Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers. editor / Jianfeng Zhan ; Rui Han ; Rui Han ; Chuliang Weng. Springer Verlag, 2014. pp. 3-18 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inbook{80dd64a50c584c1e9153ae9ec494d97f,

title = "On big data benchmarking",

abstract = "Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.",

keywords = "Benchmark, Big data systems, Data, Tests",

author = "Rui Han and Lu Xiaoyi and Xu jiangtao",

note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2014.",

year = "2014",

doi = "10.1007/978-3-319-13021-7_1",

language = "English",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "3--18",

editor = "Jianfeng Zhan and Rui Han and Rui Han and Chuliang Weng",

booktitle = "Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers",

address = "Germany",

}

Han, R, Xiaoyi, L & jiangtao, X 2014, On big data benchmarking. in J Zhan, R Han, R Han & C Weng (eds), Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8807, Springer Verlag, pp. 3-18. https://doi.org/10.1007/978-3-319-13021-7_1

On big data benchmarking. / Han, Rui; Xiaoyi, Lu; jiangtao, Xu.
Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers. ed. / Jianfeng Zhan; Rui Han; Rui Han; Chuliang Weng. Springer Verlag, 2014. p. 3-18 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8807).

Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review

TY - CHAP

T1 - On big data benchmarking

AU - Han, Rui

AU - Xiaoyi, Lu

AU - jiangtao, Xu

N1 - Publisher Copyright: © Springer International Publishing Switzerland 2014.

PY - 2014

Y1 - 2014

N2 - Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.

AB - Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.

KW - Benchmark

KW - Big data systems

KW - Data

KW - Tests

UR - http://www.scopus.com/inward/record.url?scp=84921390128&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-13021-7_1

DO - 10.1007/978-3-319-13021-7_1

M3 - Chapter

AN - SCOPUS:84921390128

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 18

BT - Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers

A2 - Zhan, Jianfeng

A2 - Han, Rui

A2 - Weng, Chuliang

PB - Springer Verlag

ER -

Han R, Xiaoyi L, jiangtao X. On big data benchmarking. In Zhan J, Han R, Han R, Weng C, editors, Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers. Springer Verlag. 2014. p. 3-18. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-13021-7_1

On big data benchmarking

Abstract

Publication series

Keywords

Access to Document

Other files and links

Fingerprint

Cite this