TY - GEN
T1 - DeepEE
T2 - 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
AU - Ran, Yongyi
AU - Hu, Han
AU - Zhou, Xin
AU - Wen, Yonggang
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - The past decade witnessed the tremendous growth of power consumption in data centers due to the rapid development of cloud computing, big data analytics, and machine learning, etc. The prior approaches that optimize the power consumption of the information technology (IT) system and/or the cooling system always fail to capture the system dynamics or suffer from the complexity of system states and action spaces. In this paper, we propose a Deep Reinforcement Learning (DRL) based optimization framework, named DeepEE, to improve the energy efficiency for data centers by considering the IT and cooling systems concurrently. In DeepEE, we first propose a PArameterized action space based Deep Q-Network (PADQN) algorithm to solve the hybrid action space problem and jointly optimize the job scheduling for the IT system and the airflow rate adjustment for the cooling system. Then, a two-time-scale control mechanism is applied in PADQN to coordinate the IT and cooling systems more accurately and efficiently. In addition, to train and evaluate the proposed PADQN in a safe and quick way, we build a simulation platform to model the dynamics of IT workload and cooling systems simultaneously. Through extensive real-trace based simulations, we demonstrate that: 1) our algorithm can save up to 15% and 10% energy consumption in comparison with the baseline siloed and joint optimization approaches respectively; 2) our algorithm achieves more stable performance gain in terms of power consumption by adopting the parameterized action space; and 3) our algorithm leads to a better tradeoff between energy saving and service quality.
AB - The past decade witnessed the tremendous growth of power consumption in data centers due to the rapid development of cloud computing, big data analytics, and machine learning, etc. The prior approaches that optimize the power consumption of the information technology (IT) system and/or the cooling system always fail to capture the system dynamics or suffer from the complexity of system states and action spaces. In this paper, we propose a Deep Reinforcement Learning (DRL) based optimization framework, named DeepEE, to improve the energy efficiency for data centers by considering the IT and cooling systems concurrently. In DeepEE, we first propose a PArameterized action space based Deep Q-Network (PADQN) algorithm to solve the hybrid action space problem and jointly optimize the job scheduling for the IT system and the airflow rate adjustment for the cooling system. Then, a two-time-scale control mechanism is applied in PADQN to coordinate the IT and cooling systems more accurately and efficiently. In addition, to train and evaluate the proposed PADQN in a safe and quick way, we build a simulation platform to model the dynamics of IT workload and cooling systems simultaneously. Through extensive real-trace based simulations, we demonstrate that: 1) our algorithm can save up to 15% and 10% energy consumption in comparison with the baseline siloed and joint optimization approaches respectively; 2) our algorithm achieves more stable performance gain in terms of power consumption by adopting the parameterized action space; and 3) our algorithm leads to a better tradeoff between energy saving and service quality.
KW - Cooling control
KW - Data center
KW - Deep reinforcement learning
KW - Energy efficiency
KW - Job scheduling
UR - http://www.scopus.com/inward/record.url?scp=85074855104&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2019.00070
DO - 10.1109/ICDCS.2019.00070
M3 - Conference contribution
AN - SCOPUS:85074855104
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 645
EP - 655
BT - Proceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 July 2019 through 9 July 2019
ER -