TY - GEN
T1 - Delayed Soft Actor-Critic Based Path Planning Method for UAV in Dense Obstacles Environment
AU - Zhong, Jianxin
AU - Long, Teng
AU - Sun, JingLiang
AU - Li, Junzhi
AU - Cao, Yan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In order to improve the convergence performance of soft actor-critic (SAC) algorithm in path planning problems, a delayed prioritized experience replay soft actor critic (DPERSAC) is proposed by designing a novel experience replay mechanism in a non-uniform manner for decreasing the convergence time. The path planning mathematical model is built for unmanned aerial vehicles (UAVs) subject to the flight performance constraints and obstacle avoidance constraints. Then the three typical elements of SAC are customized to satisfy the requirements of UAV's path planning. Differ from the traditional update manner that the soft Q-function network and policy network are updated recursively, the soft Q-function network is updated conditionally firstly and the policy network is subsequently iterated based on the trained soft Q-function in this paper. Finally, the Monte Carlo simulation results demonstrate that the computational time of the proposed DPERSAC method is only 4% of the rolling-based sparse A ∗ algorithm in the dense obstacle environment.
AB - In order to improve the convergence performance of soft actor-critic (SAC) algorithm in path planning problems, a delayed prioritized experience replay soft actor critic (DPERSAC) is proposed by designing a novel experience replay mechanism in a non-uniform manner for decreasing the convergence time. The path planning mathematical model is built for unmanned aerial vehicles (UAVs) subject to the flight performance constraints and obstacle avoidance constraints. Then the three typical elements of SAC are customized to satisfy the requirements of UAV's path planning. Differ from the traditional update manner that the soft Q-function network and policy network are updated recursively, the soft Q-function network is updated conditionally firstly and the policy network is subsequently iterated based on the trained soft Q-function in this paper. Finally, the Monte Carlo simulation results demonstrate that the computational time of the proposed DPERSAC method is only 4% of the rolling-based sparse A ∗ algorithm in the dense obstacle environment.
KW - Flight Path Planning
KW - Prioritized Experience Replay
KW - Reinforcement Learning
KW - Soft Actor-Critic
UR - http://www.scopus.com/inward/record.url?scp=85173820876&partnerID=8YFLogxK
U2 - 10.1109/ICCSSE59359.2023.10245929
DO - 10.1109/ICCSSE59359.2023.10245929
M3 - Conference contribution
AN - SCOPUS:85173820876
T3 - 2023 9th International Conference on Control Science and Systems Engineering, ICCSSE 2023
SP - 172
EP - 177
BT - 2023 9th International Conference on Control Science and Systems Engineering, ICCSSE 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Control Science and Systems Engineering, ICCSSE 2023
Y2 - 16 June 2023 through 18 June 2023
ER -