基 于 深 度 强 化 学 习 的 电 力 系 统 安 全 校 正 控 制

Yidi Wang; Lixin Li; Yijun Yu; Nan Yang; Meng Liu; Tong Li

doi:10.7500/AEPS20220706006

基于深度强化学习的电力系统安全校正控制

Yidi Wang, Lixin Li^*, Yijun Yu, Nan Yang, Meng Liu, Tong Li

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

In the new power system, the uncertainty of both sides of the source and load makes the power flow fluctuation increase significantly. The power system security correction control can eliminate the system power flow over-limit and ensure the safe operation of the power grid. However, the traditional security correction control methods have many constraints and complex calculation, and it is difficult to make real-time multi-step decisions for large-scale power grids. Therefore, this paper proposes a two-stage training method based on deep deterministic policy gradient (DDPG) to determine the security correction control strategy. Firstly, combining the security correction control problem with deep reinforcement learning, the Markov decision process (MDP) model of the security correction is constructed by designing the state, action and reward function of reinforcement learning. Secondly, a two-stage training framework is proposed to obtain the optimal correction strategy. In the pre-training stage of the imitation learning, based on the expert strategy, the imitation learning is used to provide the initial neural network for agents and improve the training speed. In the training stage of the reinforcement learning, the agent is further trained through the continuous interaction between DDPG agent and the environment. The trained agent can be applied in real time to obtain the optimal decision. Finally, the effectiveness of the proposed method is verified by a simulation case based on a provincial power grid of China.

投稿的翻译标题	Power System Security Correction Control Based on Deep Reinforcement Learning
源语言	繁体中文
页（从-至）	121-129
页数	9
期刊	Dianli Xitong Zidonghua/Automation of Electric Power Systems
卷	47
期	12
DOI	https://doi.org/10.7500/AEPS20220706006
出版状态	已出版 - 2023
已对外发布	是

关键词

deep reinforcement learning
imitation learning
power flow over-limit
security correction control

访问文件

10.7500/AEPS20220706006

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{62951497d9f14e788d4b3902555a278c,

title = "基于深度强化学习的电力系统安全校正控制",

abstract = "In the new power system, the uncertainty of both sides of the source and load makes the power flow fluctuation increase significantly. The power system security correction control can eliminate the system power flow over-limit and ensure the safe operation of the power grid. However, the traditional security correction control methods have many constraints and complex calculation, and it is difficult to make real-time multi-step decisions for large-scale power grids. Therefore, this paper proposes a two-stage training method based on deep deterministic policy gradient (DDPG) to determine the security correction control strategy. Firstly, combining the security correction control problem with deep reinforcement learning, the Markov decision process (MDP) model of the security correction is constructed by designing the state, action and reward function of reinforcement learning. Secondly, a two-stage training framework is proposed to obtain the optimal correction strategy. In the pre-training stage of the imitation learning, based on the expert strategy, the imitation learning is used to provide the initial neural network for agents and improve the training speed. In the training stage of the reinforcement learning, the agent is further trained through the continuous interaction between DDPG agent and the environment. The trained agent can be applied in real time to obtain the optimal decision. Finally, the effectiveness of the proposed method is verified by a simulation case based on a provincial power grid of China.",

keywords = "deep reinforcement learning, imitation learning, power flow over-limit, security correction control",

author = "Yidi Wang and Lixin Li and Yijun Yu and Nan Yang and Meng Liu and Tong Li",

year = "2023",

doi = "10.7500/AEPS20220706006",

language = "繁体中文",

volume = "47",

pages = "121--129",

journal = "Dianli Xitong Zidonghua/Automation of Electric Power Systems",

issn = "1000-1026",

publisher = "Automation of Electric Power Systems Press",

number = "12",

}

TY - JOUR

T1 - 基于深度强化学习的电力系统安全校正控制

AU - Wang, Yidi

AU - Li, Lixin

AU - Yu, Yijun

AU - Yang, Nan

AU - Liu, Meng

AU - Li, Tong

PY - 2023

Y1 - 2023

N2 - In the new power system, the uncertainty of both sides of the source and load makes the power flow fluctuation increase significantly. The power system security correction control can eliminate the system power flow over-limit and ensure the safe operation of the power grid. However, the traditional security correction control methods have many constraints and complex calculation, and it is difficult to make real-time multi-step decisions for large-scale power grids. Therefore, this paper proposes a two-stage training method based on deep deterministic policy gradient (DDPG) to determine the security correction control strategy. Firstly, combining the security correction control problem with deep reinforcement learning, the Markov decision process (MDP) model of the security correction is constructed by designing the state, action and reward function of reinforcement learning. Secondly, a two-stage training framework is proposed to obtain the optimal correction strategy. In the pre-training stage of the imitation learning, based on the expert strategy, the imitation learning is used to provide the initial neural network for agents and improve the training speed. In the training stage of the reinforcement learning, the agent is further trained through the continuous interaction between DDPG agent and the environment. The trained agent can be applied in real time to obtain the optimal decision. Finally, the effectiveness of the proposed method is verified by a simulation case based on a provincial power grid of China.

AB - In the new power system, the uncertainty of both sides of the source and load makes the power flow fluctuation increase significantly. The power system security correction control can eliminate the system power flow over-limit and ensure the safe operation of the power grid. However, the traditional security correction control methods have many constraints and complex calculation, and it is difficult to make real-time multi-step decisions for large-scale power grids. Therefore, this paper proposes a two-stage training method based on deep deterministic policy gradient (DDPG) to determine the security correction control strategy. Firstly, combining the security correction control problem with deep reinforcement learning, the Markov decision process (MDP) model of the security correction is constructed by designing the state, action and reward function of reinforcement learning. Secondly, a two-stage training framework is proposed to obtain the optimal correction strategy. In the pre-training stage of the imitation learning, based on the expert strategy, the imitation learning is used to provide the initial neural network for agents and improve the training speed. In the training stage of the reinforcement learning, the agent is further trained through the continuous interaction between DDPG agent and the environment. The trained agent can be applied in real time to obtain the optimal decision. Finally, the effectiveness of the proposed method is verified by a simulation case based on a provincial power grid of China.

KW - deep reinforcement learning

KW - imitation learning

KW - power flow over-limit

KW - security correction control

UR - http://www.scopus.com/inward/record.url?scp=85164263152&partnerID=8YFLogxK

U2 - 10.7500/AEPS20220706006

DO - 10.7500/AEPS20220706006

M3 - 文章

AN - SCOPUS:85164263152

SN - 1000-1026

VL - 47

SP - 121

EP - 129

JO - Dianli Xitong Zidonghua/Automation of Electric Power Systems

JF - Dianli Xitong Zidonghua/Automation of Electric Power Systems

IS - 12

ER -

基 于 深 度 强 化 学 习 的 电 力 系 统 安 全 校 正 控 制

摘要

关键词

访问文件

其它文件与链接

指纹

引用此

基于深度强化学习的电力系统安全校正控制