TY - GEN
T1 - A deterministic policy gradient based load control policy in direct current distribution networks
AU - Duan, Hong
AU - Zhou, Xu
AU - Kang, Xianhong
AU - Ma, Zhongjing
N1 - Publisher Copyright:
© WCSE 2019. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Developing algorithms for global optimum seeking of non-convex optimization problems has special potential in the real world. Previous researches in this field suffer from resulting a local optimum or losing some accuracy by convex relaxation. In this paper, we consider a demand side management (DSM) problem in direct current (DC) distribution networks as an application to study the global optimum seeking of non-convex optimization. Due to the voltage and network constraints, non-convexity appears in the objective function taking into account the tradeoff between the operation costs and users' preferences. By the freedom to express learning problem as a non-convex optimization, we explore a deterministic policy gradient (DPG) based algorithm to calculate the global optimum. A policy network and a polynomial regression critic are built to learn the optimal policy under an exploration noise. Numerical results are provided to demonstrate the DPG algorithm increasing the probability of convergence to the global optimum.
AB - Developing algorithms for global optimum seeking of non-convex optimization problems has special potential in the real world. Previous researches in this field suffer from resulting a local optimum or losing some accuracy by convex relaxation. In this paper, we consider a demand side management (DSM) problem in direct current (DC) distribution networks as an application to study the global optimum seeking of non-convex optimization. Due to the voltage and network constraints, non-convexity appears in the objective function taking into account the tradeoff between the operation costs and users' preferences. By the freedom to express learning problem as a non-convex optimization, we explore a deterministic policy gradient (DPG) based algorithm to calculate the global optimum. A policy network and a polynomial regression critic are built to learn the optimal policy under an exploration noise. Numerical results are provided to demonstrate the DPG algorithm increasing the probability of convergence to the global optimum.
KW - Demand-side management (DSM)
KW - Deterministic policy gradient
KW - Distribution networks
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85081097981&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85081097981
T3 - Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, WCSE 2019
SP - 996
EP - 1001
BT - Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, WCSE 2019
PB - International Workshop on Computer Science and Engineering (WCSE)
T2 - 2019 9th International Workshop on Computer Science and Engineering, WCSE 2019
Y2 - 15 June 2019 through 17 June 2019
ER -