TY - GEN
T1 - A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter
AU - Yan, Hedong
AU - Wen, Shilin
AU - Han, Rui
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - In large-scale data center, collecting run-time data is a very effective method which can be used to analyze and monitor the performance of data centers. But due to the huge size of data centers, limited computing resources and the requirement of low delay, it is very difficult and unrealistic to collect all the data in large-scale data centers. Therefore, to solve the serious problem, sampling partial data from all data is a common method at present. However, existing researches only focus on designing some efficient data sampling methods to reduce resource and time overhead in datacenters, but these works do not provide a unified and measurable framework to quantity the quality and practicability of other sampling methods. In this paper, we propose a measurable framework for general run-time data sampling in large-scale data center by modeling underlying recovering hypothesis explicitly. The proposed framework is mainly composed of four processes: sampling, collecting, recovering, and comparing. It could measure sampling bias degree accurately. And we design and implement three sampling methods with different recovering hypothesis. The experimental results demonstrate that the proposed framework can help us find a better run-time data sampling method effectively which has a lower sampling bias degree with same sampling rate.
AB - In large-scale data center, collecting run-time data is a very effective method which can be used to analyze and monitor the performance of data centers. But due to the huge size of data centers, limited computing resources and the requirement of low delay, it is very difficult and unrealistic to collect all the data in large-scale data centers. Therefore, to solve the serious problem, sampling partial data from all data is a common method at present. However, existing researches only focus on designing some efficient data sampling methods to reduce resource and time overhead in datacenters, but these works do not provide a unified and measurable framework to quantity the quality and practicability of other sampling methods. In this paper, we propose a measurable framework for general run-time data sampling in large-scale data center by modeling underlying recovering hypothesis explicitly. The proposed framework is mainly composed of four processes: sampling, collecting, recovering, and comparing. It could measure sampling bias degree accurately. And we design and implement three sampling methods with different recovering hypothesis. The experimental results demonstrate that the proposed framework can help us find a better run-time data sampling method effectively which has a lower sampling bias degree with same sampling rate.
KW - large-scale datacenter
KW - measurable framework
KW - recovering hypothesis
KW - run-time data collecting
KW - sampling bias degree
UR - http://www.scopus.com/inward/record.url?scp=85091938745&partnerID=8YFLogxK
U2 - 10.1109/ICSIDP47821.2019.9173399
DO - 10.1109/ICSIDP47821.2019.9173399
M3 - Conference contribution
AN - SCOPUS:85091938745
T3 - ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019
BT - ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019
Y2 - 11 December 2019 through 13 December 2019
ER -