TY - JOUR
T1 - 基 于 深 度 强 化 学 习 的 电 力 系 统 安 全 校 正 控 制
AU - Wang, Yidi
AU - Li, Lixin
AU - Yu, Yijun
AU - Yang, Nan
AU - Liu, Meng
AU - Li, Tong
N1 - Publisher Copyright:
© 2023 Automation of Electric Power Systems Press. All rights reserved.
PY - 2023
Y1 - 2023
N2 - In the new power system, the uncertainty of both sides of the source and load makes the power flow fluctuation increase significantly. The power system security correction control can eliminate the system power flow over-limit and ensure the safe operation of the power grid. However, the traditional security correction control methods have many constraints and complex calculation, and it is difficult to make real-time multi-step decisions for large-scale power grids. Therefore, this paper proposes a two-stage training method based on deep deterministic policy gradient (DDPG) to determine the security correction control strategy. Firstly, combining the security correction control problem with deep reinforcement learning, the Markov decision process (MDP) model of the security correction is constructed by designing the state, action and reward function of reinforcement learning. Secondly, a two-stage training framework is proposed to obtain the optimal correction strategy. In the pre-training stage of the imitation learning, based on the expert strategy, the imitation learning is used to provide the initial neural network for agents and improve the training speed. In the training stage of the reinforcement learning, the agent is further trained through the continuous interaction between DDPG agent and the environment. The trained agent can be applied in real time to obtain the optimal decision. Finally, the effectiveness of the proposed method is verified by a simulation case based on a provincial power grid of China.
AB - In the new power system, the uncertainty of both sides of the source and load makes the power flow fluctuation increase significantly. The power system security correction control can eliminate the system power flow over-limit and ensure the safe operation of the power grid. However, the traditional security correction control methods have many constraints and complex calculation, and it is difficult to make real-time multi-step decisions for large-scale power grids. Therefore, this paper proposes a two-stage training method based on deep deterministic policy gradient (DDPG) to determine the security correction control strategy. Firstly, combining the security correction control problem with deep reinforcement learning, the Markov decision process (MDP) model of the security correction is constructed by designing the state, action and reward function of reinforcement learning. Secondly, a two-stage training framework is proposed to obtain the optimal correction strategy. In the pre-training stage of the imitation learning, based on the expert strategy, the imitation learning is used to provide the initial neural network for agents and improve the training speed. In the training stage of the reinforcement learning, the agent is further trained through the continuous interaction between DDPG agent and the environment. The trained agent can be applied in real time to obtain the optimal decision. Finally, the effectiveness of the proposed method is verified by a simulation case based on a provincial power grid of China.
KW - deep reinforcement learning
KW - imitation learning
KW - power flow over-limit
KW - security correction control
UR - http://www.scopus.com/inward/record.url?scp=85164263152&partnerID=8YFLogxK
U2 - 10.7500/AEPS20220706006
DO - 10.7500/AEPS20220706006
M3 - 文章
AN - SCOPUS:85164263152
SN - 1000-1026
VL - 47
SP - 121
EP - 129
JO - Dianli Xitong Zidonghua/Automation of Electric Power Systems
JF - Dianli Xitong Zidonghua/Automation of Electric Power Systems
IS - 12
ER -