Characterizing and subsetting big data workloads

Zhen Jia, Jianfeng Zhan*, Lei Wang, Rui Han, Sally A. McKee, Qiang Yang, Chunjie Luo, Jingwei Li

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

59 引用 (Scopus)

摘要

Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates these challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/ simulatorversion/.

源语言英语
主期刊名IISWC 2014 - IEEE International Symposium on Workload Characterization
出版商Institute of Electrical and Electronics Engineers Inc.
191-201
页数11
ISBN(电子版)9781479964536
DOI
出版状态已出版 - 11 12月 2014
已对外发布
活动2014 IEEE International Symposium on Workload Characterization, IISWC 2014 - Raleigh, 美国
期限: 26 10月 201428 10月 2014

出版系列

姓名IISWC 2014 - IEEE International Symposium on Workload Characterization

会议

会议2014 IEEE International Symposium on Workload Characterization, IISWC 2014
国家/地区美国
Raleigh
时期26/10/1428/10/14

指纹

探究 'Characterizing and subsetting big data workloads' 的科研主题。它们共同构成独一无二的指纹。

引用此