Multiple UAVs Path Planning Based on Deep Reinforcement Learning in Communication Denial Environment

Yahao Xu; Yiran Wei; Keyang Jiang; Di Wang; Hongbin Deng

doi:10.3390/math11020405

Multiple UAVs Path Planning Based on Deep Reinforcement Learning in Communication Denial Environment

Yahao Xu, Yiran Wei^*, Keyang Jiang, Di Wang, Hongbin Deng

^*此作品的通讯作者

机电学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

20 引用（Scopus）

摘要

In this paper, we propose a C51-Duel-IP (C51 Dueling DQN with Independent Policy) dynamic destination path-planning algorithm to solve the problem of autonomous navigation and avoidance of multiple Unmanned Aerial Vehicles (UAVs) in the communication denial environment. Our proposed algorithm expresses the Q function output by the Dueling network as a Q distribution, which improves the fitting ability of the Q value. We also extend the single-step temporal differential (TD) to the N-step timing differential, which solves the problem of inflexible updates of the single-step temporal differential. More importantly, we use an independent policy to achieve autonomous avoidance and navigation of multiple UAVs without any communication with each other. In the case of communication rejection, the independent policy can achieve the consistency of multiple UAVs and avoid the greedy behavior of UAVs. In multiple-UAV dynamic destination scenarios, our work includes path planning, taking off from different initial positions, and dynamic path planning, taking off from the same initial position. The hardware-in-the-loop (HITL) experiment results show that our C51-Duel-IP algorithm is much more robust and effective than the original Dueling-IP and DQN-IP algorithms in an urban simulation environment. Our independent policy algorithm has similar effects as the shared policy but with the significant advantage of running in a communication denial environment.

源语言	英语
文章编号	405
期刊	Mathematics
卷	11
期	2
DOI	https://doi.org/10.3390/math11020405
出版状态	已出版 - 1月 2023

访问文件

10.3390/math11020405

其它文件与链接

链接到 Scopus 的出版物

引用此

Xu, Y., Wei, Y., Jiang, K., Wang, D., & Deng, H. (2023). Multiple UAVs Path Planning Based on Deep Reinforcement Learning in Communication Denial Environment. Mathematics, 11(2), 文章 405. https://doi.org/10.3390/math11020405

@article{29db6b6af979410bb239512f97fd05f6,

title = "Multiple UAVs Path Planning Based on Deep Reinforcement Learning in Communication Denial Environment",

abstract = "In this paper, we propose a C51-Duel-IP (C51 Dueling DQN with Independent Policy) dynamic destination path-planning algorithm to solve the problem of autonomous navigation and avoidance of multiple Unmanned Aerial Vehicles (UAVs) in the communication denial environment. Our proposed algorithm expresses the Q function output by the Dueling network as a Q distribution, which improves the fitting ability of the Q value. We also extend the single-step temporal differential (TD) to the N-step timing differential, which solves the problem of inflexible updates of the single-step temporal differential. More importantly, we use an independent policy to achieve autonomous avoidance and navigation of multiple UAVs without any communication with each other. In the case of communication rejection, the independent policy can achieve the consistency of multiple UAVs and avoid the greedy behavior of UAVs. In multiple-UAV dynamic destination scenarios, our work includes path planning, taking off from different initial positions, and dynamic path planning, taking off from the same initial position. The hardware-in-the-loop (HITL) experiment results show that our C51-Duel-IP algorithm is much more robust and effective than the original Dueling-IP and DQN-IP algorithms in an urban simulation environment. Our independent policy algorithm has similar effects as the shared policy but with the significant advantage of running in a communication denial environment.",

keywords = "UAV path planning, communication denial, multi-agent reinforcement learning, visual perception",

author = "Yahao Xu and Yiran Wei and Keyang Jiang and Di Wang and Hongbin Deng",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = jan,

doi = "10.3390/math11020405",

language = "English",

volume = "11",

journal = "Mathematics",

issn = "2227-7390",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "2",

}

TY - JOUR

T1 - Multiple UAVs Path Planning Based on Deep Reinforcement Learning in Communication Denial Environment

AU - Xu, Yahao

AU - Wei, Yiran

AU - Jiang, Keyang

AU - Wang, Di

AU - Deng, Hongbin

PY - 2023/1

Y1 - 2023/1

N2 - In this paper, we propose a C51-Duel-IP (C51 Dueling DQN with Independent Policy) dynamic destination path-planning algorithm to solve the problem of autonomous navigation and avoidance of multiple Unmanned Aerial Vehicles (UAVs) in the communication denial environment. Our proposed algorithm expresses the Q function output by the Dueling network as a Q distribution, which improves the fitting ability of the Q value. We also extend the single-step temporal differential (TD) to the N-step timing differential, which solves the problem of inflexible updates of the single-step temporal differential. More importantly, we use an independent policy to achieve autonomous avoidance and navigation of multiple UAVs without any communication with each other. In the case of communication rejection, the independent policy can achieve the consistency of multiple UAVs and avoid the greedy behavior of UAVs. In multiple-UAV dynamic destination scenarios, our work includes path planning, taking off from different initial positions, and dynamic path planning, taking off from the same initial position. The hardware-in-the-loop (HITL) experiment results show that our C51-Duel-IP algorithm is much more robust and effective than the original Dueling-IP and DQN-IP algorithms in an urban simulation environment. Our independent policy algorithm has similar effects as the shared policy but with the significant advantage of running in a communication denial environment.

AB - In this paper, we propose a C51-Duel-IP (C51 Dueling DQN with Independent Policy) dynamic destination path-planning algorithm to solve the problem of autonomous navigation and avoidance of multiple Unmanned Aerial Vehicles (UAVs) in the communication denial environment. Our proposed algorithm expresses the Q function output by the Dueling network as a Q distribution, which improves the fitting ability of the Q value. We also extend the single-step temporal differential (TD) to the N-step timing differential, which solves the problem of inflexible updates of the single-step temporal differential. More importantly, we use an independent policy to achieve autonomous avoidance and navigation of multiple UAVs without any communication with each other. In the case of communication rejection, the independent policy can achieve the consistency of multiple UAVs and avoid the greedy behavior of UAVs. In multiple-UAV dynamic destination scenarios, our work includes path planning, taking off from different initial positions, and dynamic path planning, taking off from the same initial position. The hardware-in-the-loop (HITL) experiment results show that our C51-Duel-IP algorithm is much more robust and effective than the original Dueling-IP and DQN-IP algorithms in an urban simulation environment. Our independent policy algorithm has similar effects as the shared policy but with the significant advantage of running in a communication denial environment.

KW - UAV path planning

KW - communication denial

KW - multi-agent reinforcement learning

KW - visual perception

UR - http://www.scopus.com/inward/record.url?scp=85146786977&partnerID=8YFLogxK

U2 - 10.3390/math11020405

DO - 10.3390/math11020405

M3 - Article

AN - SCOPUS:85146786977

SN - 2227-7390

VL - 11

JO - Mathematics

JF - Mathematics

IS - 2

M1 - 405

ER -

Multiple UAVs Path Planning Based on Deep Reinforcement Learning in Communication Denial Environment

摘要

访问文件

其它文件与链接

指纹

引用此