A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter

Hedong Yan; Shilin Wen; Rui Han

doi:10.1109/ICSIDP47821.2019.9173399

A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter

Hedong Yan, Shilin Wen, Rui Han

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In large-scale data center, collecting run-time data is a very effective method which can be used to analyze and monitor the performance of data centers. But due to the huge size of data centers, limited computing resources and the requirement of low delay, it is very difficult and unrealistic to collect all the data in large-scale data centers. Therefore, to solve the serious problem, sampling partial data from all data is a common method at present. However, existing researches only focus on designing some efficient data sampling methods to reduce resource and time overhead in datacenters, but these works do not provide a unified and measurable framework to quantity the quality and practicability of other sampling methods. In this paper, we propose a measurable framework for general run-time data sampling in large-scale data center by modeling underlying recovering hypothesis explicitly. The proposed framework is mainly composed of four processes: sampling, collecting, recovering, and comparing. It could measure sampling bias degree accurately. And we design and implement three sampling methods with different recovering hypothesis. The experimental results demonstrate that the proposed framework can help us find a better run-time data sampling method effectively which has a lower sampling bias degree with same sampling rate.

Original language	English
Title of host publication	ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781728123455
DOIs	https://doi.org/10.1109/ICSIDP47821.2019.9173399
Publication status	Published - Dec 2019
Event	2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019 - Chongqing, China Duration: 11 Dec 2019 → 13 Dec 2019

Publication series

Name	ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019

Conference

Conference	2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019
Country/Territory	China
City	Chongqing
Period	11/12/19 → 13/12/19

Keywords

large-scale datacenter
measurable framework
recovering hypothesis
run-time data collecting
sampling bias degree

Access to Document

10.1109/ICSIDP47821.2019.9173399

Cite this

Yan, H., Wen, S., & Han, R. (2019). A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter. In ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019 Article 9173399 (ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSIDP47821.2019.9173399

@inproceedings{0a3000d22e5d4e5a925a8320ea7f9b71,

title = "A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter",

abstract = "In large-scale data center, collecting run-time data is a very effective method which can be used to analyze and monitor the performance of data centers. But due to the huge size of data centers, limited computing resources and the requirement of low delay, it is very difficult and unrealistic to collect all the data in large-scale data centers. Therefore, to solve the serious problem, sampling partial data from all data is a common method at present. However, existing researches only focus on designing some efficient data sampling methods to reduce resource and time overhead in datacenters, but these works do not provide a unified and measurable framework to quantity the quality and practicability of other sampling methods. In this paper, we propose a measurable framework for general run-time data sampling in large-scale data center by modeling underlying recovering hypothesis explicitly. The proposed framework is mainly composed of four processes: sampling, collecting, recovering, and comparing. It could measure sampling bias degree accurately. And we design and implement three sampling methods with different recovering hypothesis. The experimental results demonstrate that the proposed framework can help us find a better run-time data sampling method effectively which has a lower sampling bias degree with same sampling rate.",

keywords = "large-scale datacenter, measurable framework, recovering hypothesis, run-time data collecting, sampling bias degree",

author = "Hedong Yan and Shilin Wen and Rui Han",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019 ; Conference date: 11-12-2019 Through 13-12-2019",

year = "2019",

month = dec,

doi = "10.1109/ICSIDP47821.2019.9173399",

language = "English",

series = "ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019",

address = "United States",

}

Yan, H, Wen, S & Han, R 2019, A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter. in ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019., 9173399, ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019, Institute of Electrical and Electronics Engineers Inc., 2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019, Chongqing, China, 11/12/19. https://doi.org/10.1109/ICSIDP47821.2019.9173399

A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter. / Yan, Hedong; Wen, Shilin; Han, Rui.
ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 9173399 (ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter

AU - Yan, Hedong

AU - Wen, Shilin

AU - Han, Rui

PY - 2019/12

Y1 - 2019/12

N2 - In large-scale data center, collecting run-time data is a very effective method which can be used to analyze and monitor the performance of data centers. But due to the huge size of data centers, limited computing resources and the requirement of low delay, it is very difficult and unrealistic to collect all the data in large-scale data centers. Therefore, to solve the serious problem, sampling partial data from all data is a common method at present. However, existing researches only focus on designing some efficient data sampling methods to reduce resource and time overhead in datacenters, but these works do not provide a unified and measurable framework to quantity the quality and practicability of other sampling methods. In this paper, we propose a measurable framework for general run-time data sampling in large-scale data center by modeling underlying recovering hypothesis explicitly. The proposed framework is mainly composed of four processes: sampling, collecting, recovering, and comparing. It could measure sampling bias degree accurately. And we design and implement three sampling methods with different recovering hypothesis. The experimental results demonstrate that the proposed framework can help us find a better run-time data sampling method effectively which has a lower sampling bias degree with same sampling rate.

AB - In large-scale data center, collecting run-time data is a very effective method which can be used to analyze and monitor the performance of data centers. But due to the huge size of data centers, limited computing resources and the requirement of low delay, it is very difficult and unrealistic to collect all the data in large-scale data centers. Therefore, to solve the serious problem, sampling partial data from all data is a common method at present. However, existing researches only focus on designing some efficient data sampling methods to reduce resource and time overhead in datacenters, but these works do not provide a unified and measurable framework to quantity the quality and practicability of other sampling methods. In this paper, we propose a measurable framework for general run-time data sampling in large-scale data center by modeling underlying recovering hypothesis explicitly. The proposed framework is mainly composed of four processes: sampling, collecting, recovering, and comparing. It could measure sampling bias degree accurately. And we design and implement three sampling methods with different recovering hypothesis. The experimental results demonstrate that the proposed framework can help us find a better run-time data sampling method effectively which has a lower sampling bias degree with same sampling rate.

KW - large-scale datacenter

KW - measurable framework

KW - recovering hypothesis

KW - run-time data collecting

KW - sampling bias degree

UR - http://www.scopus.com/inward/record.url?scp=85091938745&partnerID=8YFLogxK

U2 - 10.1109/ICSIDP47821.2019.9173399

DO - 10.1109/ICSIDP47821.2019.9173399

M3 - Conference contribution

AN - SCOPUS:85091938745

T3 - ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019

BT - ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019

Y2 - 11 December 2019 through 13 December 2019

ER -

Yan H, Wen S, Han R. A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter. In ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019. Institute of Electrical and Electronics Engineers Inc. 2019. 9173399. (ICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019). doi: 10.1109/ICSIDP47821.2019.9173399

A Measurable Framework for Run-time Data Sampling in Large-scale Datacenter

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this