TY - CHAP
T1 - On big data benchmarking
AU - Han, Rui
AU - Xiaoyi, Lu
AU - jiangtao, Xu
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2014.
PY - 2014
Y1 - 2014
N2 - Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.
AB - Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.
KW - Benchmark
KW - Big data systems
KW - Data
KW - Tests
UR - http://www.scopus.com/inward/record.url?scp=84921390128&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-13021-7_1
DO - 10.1007/978-3-319-13021-7_1
M3 - Chapter
AN - SCOPUS:84921390128
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 18
BT - Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Revised Selected Papers
A2 - Zhan, Jianfeng
A2 - Han, Rui
A2 - Han, Rui
A2 - Weng, Chuliang
PB - Springer Verlag
ER -