TY - JOUR
T1 - Real-time power scheduling through reinforcement learning from demonstrations
AU - Liu, Shaohuai
AU - Liu, Jinbo
AU - Yang, Nan
AU - Huang, Yupeng
AU - Jiang, Qirong
AU - Gao, Yang
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/10
Y1 - 2024/10
N2 - Real-time decision-making in power system scheduling is imperative in response to the increasing integration of renewable energy. This paper proposes a novel framework leveraging Reinforcement Learning from Demonstration (RLfD) to address complex unit commitment (UC) and optimal power flow (OPF) challenges, called GridZero-Imitation (GZ-I). Unlike traditional RL approaches that require complex reward function designs and have limited performance insurance, our method employs intuitive rewards and expert demonstrations to regularize the RL training. The demonstrations can be collected from asynchronous reanalysis of an expert solver, enabling RL to synergize with expert knowledge. Specifically, we conduct a decoupled training approach, employing two separate policy networks, RL and expert. During the Monte Carlo Tree Search (MCTS) process, action candidates from the expert policy foster a guided search mechanism, which is especially helpful in the early training stage. This framework alleviates the speed bottleneck typical of physics-based solvers in online decision-making, and also significantly enhances control performance and convergence speed of RL scheduling agents, as validated by substantial improvements in a 126-node real provincial test case.
AB - Real-time decision-making in power system scheduling is imperative in response to the increasing integration of renewable energy. This paper proposes a novel framework leveraging Reinforcement Learning from Demonstration (RLfD) to address complex unit commitment (UC) and optimal power flow (OPF) challenges, called GridZero-Imitation (GZ-I). Unlike traditional RL approaches that require complex reward function designs and have limited performance insurance, our method employs intuitive rewards and expert demonstrations to regularize the RL training. The demonstrations can be collected from asynchronous reanalysis of an expert solver, enabling RL to synergize with expert knowledge. Specifically, we conduct a decoupled training approach, employing two separate policy networks, RL and expert. During the Monte Carlo Tree Search (MCTS) process, action candidates from the expert policy foster a guided search mechanism, which is especially helpful in the early training stage. This framework alleviates the speed bottleneck typical of physics-based solvers in online decision-making, and also significantly enhances control performance and convergence speed of RL scheduling agents, as validated by substantial improvements in a 126-node real provincial test case.
KW - Monte Carlo tree search
KW - Predictive control
KW - Real-time scheduling
KW - Reinforcement learning from demonstration
UR - http://www.scopus.com/inward/record.url?scp=85197033253&partnerID=8YFLogxK
U2 - 10.1016/j.epsr.2024.110638
DO - 10.1016/j.epsr.2024.110638
M3 - Article
AN - SCOPUS:85197033253
SN - 0378-7796
VL - 235
JO - Electric Power Systems Research
JF - Electric Power Systems Research
M1 - 110638
ER -