Abstract
Big data analytics workloads are very significant ones in modern data centers, and it is more and more important to characterize their representative workloads and understand their behaviors so as to improve the performance of data center computer systems. In this paper, we embark on a comprehensive study to understand the impacts and performance implications of the big data analytics workloads on the systems equipped with modern superscalar out-of-order processors. After investigating three most important application domains in Internet services in terms of page views and daily visitors, we choose 11 representative data analytics workloads and characterize their micro-architectural behaviors by using hardware performance counters. Our study reveals that the big data analytics workloads share many inherent characteristics, which place them in a different class from the traditional workloads and the scale-out services. To further understand the characteristics of big data analytics workloads, we perform correlation analysis to identify the most key factors that affect cycles per instruction (CPI). Also, we reveal that the increasing complexity of the big data software stacks will put higher pressures on the modern processor pipelines.
Original language | English |
---|---|
Article number | 7736117 |
Pages (from-to) | 1797-1810 |
Number of pages | 14 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 28 |
Issue number | 6 |
DOIs | |
Publication status | Published - Jun 2017 |
Externally published | Yes |
Keywords
- Big data analytics
- Micro-architectural characteristics
- Performance optimization
- Workload characterization