TY - JOUR
T1 - Understanding big data analytics workloads on modern processors
AU - Jia, Zhen
AU - Zhan, Jianfeng
AU - Wang, Lei
AU - Luo, Chunjie
AU - Gao, Wanling
AU - Jin, Yi
AU - Han, Rui
AU - Zhang, Lixin
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/6
Y1 - 2017/6
N2 - Big data analytics workloads are very significant ones in modern data centers, and it is more and more important to characterize their representative workloads and understand their behaviors so as to improve the performance of data center computer systems. In this paper, we embark on a comprehensive study to understand the impacts and performance implications of the big data analytics workloads on the systems equipped with modern superscalar out-of-order processors. After investigating three most important application domains in Internet services in terms of page views and daily visitors, we choose 11 representative data analytics workloads and characterize their micro-architectural behaviors by using hardware performance counters. Our study reveals that the big data analytics workloads share many inherent characteristics, which place them in a different class from the traditional workloads and the scale-out services. To further understand the characteristics of big data analytics workloads, we perform correlation analysis to identify the most key factors that affect cycles per instruction (CPI). Also, we reveal that the increasing complexity of the big data software stacks will put higher pressures on the modern processor pipelines.
AB - Big data analytics workloads are very significant ones in modern data centers, and it is more and more important to characterize their representative workloads and understand their behaviors so as to improve the performance of data center computer systems. In this paper, we embark on a comprehensive study to understand the impacts and performance implications of the big data analytics workloads on the systems equipped with modern superscalar out-of-order processors. After investigating three most important application domains in Internet services in terms of page views and daily visitors, we choose 11 representative data analytics workloads and characterize their micro-architectural behaviors by using hardware performance counters. Our study reveals that the big data analytics workloads share many inherent characteristics, which place them in a different class from the traditional workloads and the scale-out services. To further understand the characteristics of big data analytics workloads, we perform correlation analysis to identify the most key factors that affect cycles per instruction (CPI). Also, we reveal that the increasing complexity of the big data software stacks will put higher pressures on the modern processor pipelines.
KW - Big data analytics
KW - Micro-architectural characteristics
KW - Performance optimization
KW - Workload characterization
UR - http://www.scopus.com/inward/record.url?scp=85021760383&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2016.2625244
DO - 10.1109/TPDS.2016.2625244
M3 - Article
AN - SCOPUS:85021760383
SN - 1045-9219
VL - 28
SP - 1797
EP - 1810
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 6
M1 - 7736117
ER -