TY - JOUR
T1 - 基于 Q 学习的多无人机协同航迹规划方法
AU - Yin, Yiyi
AU - Wang, Xiaofang
AU - Zhou, Jian
N1 - Publisher Copyright:
© 2023 China Ordnance Society. All rights reserved.
PY - 2023/2
Y1 - 2023/2
N2 - To solve the path planning problem of multiple UAVs' synchronous arrival at the target, the battlefield environment model and the Markov decision process model of the path planning for a single UAV is established, and the optimal path is calculated based on the Q-learning algorithm. With this algorithm, the Q-table is obtained and used to calculate the shortest path of each UAV and the cooperative range. Then the time-coordinated paths is obtained by adjusting the action selection strategy of the circumventing UAVs. Considering the collision avoidance problem of multiple UAVs, the partical replanning area is determined by designing retreat parameters, and based on the deep reinforcement learning theory, the neural network is used to replace Q-table to re-plan the partical path for UAVs, which can avoid the problem of dimensional explosion. As for the previously unexplored obstacles, the obstacle matrix is designed based on the idea of the artificial potential field theory, which is then superimposed on the original Q-table to realize collision avoidance for the unexplored obstacle. The simulation results verify that with the proposed reinforcement learning path planning method, the coordinated paths with time coordination and collision avoidance can be obtained, and the previously unexplored obstacles in the simulation can be avoided as well. Compared with A* algorithm, the proposed method can achieve higher efficiency for online application problems.
AB - To solve the path planning problem of multiple UAVs' synchronous arrival at the target, the battlefield environment model and the Markov decision process model of the path planning for a single UAV is established, and the optimal path is calculated based on the Q-learning algorithm. With this algorithm, the Q-table is obtained and used to calculate the shortest path of each UAV and the cooperative range. Then the time-coordinated paths is obtained by adjusting the action selection strategy of the circumventing UAVs. Considering the collision avoidance problem of multiple UAVs, the partical replanning area is determined by designing retreat parameters, and based on the deep reinforcement learning theory, the neural network is used to replace Q-table to re-plan the partical path for UAVs, which can avoid the problem of dimensional explosion. As for the previously unexplored obstacles, the obstacle matrix is designed based on the idea of the artificial potential field theory, which is then superimposed on the original Q-table to realize collision avoidance for the unexplored obstacle. The simulation results verify that with the proposed reinforcement learning path planning method, the coordinated paths with time coordination and collision avoidance can be obtained, and the previously unexplored obstacles in the simulation can be avoided as well. Compared with A* algorithm, the proposed method can achieve higher efficiency for online application problems.
KW - Q-learning
KW - collision avoidance
KW - multiple UAVs
KW - path planning
KW - time coordination
UR - http://www.scopus.com/inward/record.url?scp=85159080617&partnerID=8YFLogxK
U2 - 10.12382/bgxb.2021.0606
DO - 10.12382/bgxb.2021.0606
M3 - 文章
AN - SCOPUS:85159080617
SN - 1000-1093
VL - 44
SP - 484
EP - 495
JO - Binggong Xuebao/Acta Armamentarii
JF - Binggong Xuebao/Acta Armamentarii
IS - 2
ER -