Optimization of high-speed fixed-wing UAV penetration strategy based on deep reinforcement learning

Xing Zhuang; Dongguang Li; Yue Wang; Xinyu Liu; Hanyu Li

doi:10.1016/j.ast.2024.109089

Optimization of high-speed fixed-wing UAV penetration strategy based on deep reinforcement learning

Xing Zhuang, Dongguang Li, Yue Wang^*, Xinyu Liu, Hanyu Li

^*Corresponding author for this work

School of Mechatronical Engineering

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

The penetration decision of unmanned aerial vehicles (UAVs) is one of the key components for UAVs to detect or strike crucial targets in adversarial environments. Presently, most methods lack research on penetration decisions under unknown interception information. The executing UAV for decision-making is generally a simplified three-degree-of-freedom model. The decision format is often simplified to path point optimization, lacking control over the stability of complex model attitudes and exhibiting relatively poor robustness. This paper proposes a penetration relative motion theory and aircraft control method based on the six-degrees-of-freedom UAV model to reduce response errors between algorithm control decisions and actual flight control. Using a Markov decision process, a UAV penetration decision control method based on an attitude-overload control loop is designed. This method combines reinforcement learning techniques to enable autonomous penetration decision-making and control for UAVs facing active defense scenarios, enhancing the autonomy, accuracy, and generalization of UAV online decision-making. The paper concludes by implementing autonomous decision-making and control for UAV penetration through the integration of interceptor trajectory prediction with proximal policy optimization. This method reduces the dependency of penetration strategies on interceptor information, ensuring both the generalization ability of penetration strategy models under uncertain information and the consistency of decision stability. Through simulations and experiments, we verified that the penetration strategy based on the pre-sampling PPO method improves the penetration effect by 15.44 % on average and the hit rate by 18.48 %. Moreover, the probability of penetration in different scenarios exceeds 70 %, and the penetration effect is improved by more than 10 % in the case of multiple interceptors. This paper also analyzes the impact of interceptor spatial distribution and the quantity of interceptors on the effectiveness of UAV penetration decision-making.

Original language	English
Article number	109089
Journal	Aerospace Science and Technology
Volume	148
DOIs	https://doi.org/10.1016/j.ast.2024.109089
Publication status	Published - May 2024

Keywords

Guidance control
Maneuvering strategy
Penetration defense
Reinforcement learning
UAV

Access to Document

10.1016/j.ast.2024.109089

Cite this

@article{cb6962a2ec8e4e3d83bcebd77f608b17,

title = "Optimization of high-speed fixed-wing UAV penetration strategy based on deep reinforcement learning",

abstract = "The penetration decision of unmanned aerial vehicles (UAVs) is one of the key components for UAVs to detect or strike crucial targets in adversarial environments. Presently, most methods lack research on penetration decisions under unknown interception information. The executing UAV for decision-making is generally a simplified three-degree-of-freedom model. The decision format is often simplified to path point optimization, lacking control over the stability of complex model attitudes and exhibiting relatively poor robustness. This paper proposes a penetration relative motion theory and aircraft control method based on the six-degrees-of-freedom UAV model to reduce response errors between algorithm control decisions and actual flight control. Using a Markov decision process, a UAV penetration decision control method based on an attitude-overload control loop is designed. This method combines reinforcement learning techniques to enable autonomous penetration decision-making and control for UAVs facing active defense scenarios, enhancing the autonomy, accuracy, and generalization of UAV online decision-making. The paper concludes by implementing autonomous decision-making and control for UAV penetration through the integration of interceptor trajectory prediction with proximal policy optimization. This method reduces the dependency of penetration strategies on interceptor information, ensuring both the generalization ability of penetration strategy models under uncertain information and the consistency of decision stability. Through simulations and experiments, we verified that the penetration strategy based on the pre-sampling PPO method improves the penetration effect by 15.44 % on average and the hit rate by 18.48 %. Moreover, the probability of penetration in different scenarios exceeds 70 %, and the penetration effect is improved by more than 10 % in the case of multiple interceptors. This paper also analyzes the impact of interceptor spatial distribution and the quantity of interceptors on the effectiveness of UAV penetration decision-making.",

keywords = "Guidance control, Maneuvering strategy, Penetration defense, Reinforcement learning, UAV",

author = "Xing Zhuang and Dongguang Li and Yue Wang and Xinyu Liu and Hanyu Li",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Masson SAS",

year = "2024",

month = may,

doi = "10.1016/j.ast.2024.109089",

language = "English",

volume = "148",

journal = "Aerospace Science and Technology",

issn = "1270-9638",

publisher = "Elsevier Masson s.r.l.",

}

TY - JOUR

T1 - Optimization of high-speed fixed-wing UAV penetration strategy based on deep reinforcement learning

AU - Zhuang, Xing

AU - Li, Dongguang

AU - Wang, Yue

AU - Liu, Xinyu

AU - Li, Hanyu

PY - 2024/5

Y1 - 2024/5

N2 - The penetration decision of unmanned aerial vehicles (UAVs) is one of the key components for UAVs to detect or strike crucial targets in adversarial environments. Presently, most methods lack research on penetration decisions under unknown interception information. The executing UAV for decision-making is generally a simplified three-degree-of-freedom model. The decision format is often simplified to path point optimization, lacking control over the stability of complex model attitudes and exhibiting relatively poor robustness. This paper proposes a penetration relative motion theory and aircraft control method based on the six-degrees-of-freedom UAV model to reduce response errors between algorithm control decisions and actual flight control. Using a Markov decision process, a UAV penetration decision control method based on an attitude-overload control loop is designed. This method combines reinforcement learning techniques to enable autonomous penetration decision-making and control for UAVs facing active defense scenarios, enhancing the autonomy, accuracy, and generalization of UAV online decision-making. The paper concludes by implementing autonomous decision-making and control for UAV penetration through the integration of interceptor trajectory prediction with proximal policy optimization. This method reduces the dependency of penetration strategies on interceptor information, ensuring both the generalization ability of penetration strategy models under uncertain information and the consistency of decision stability. Through simulations and experiments, we verified that the penetration strategy based on the pre-sampling PPO method improves the penetration effect by 15.44 % on average and the hit rate by 18.48 %. Moreover, the probability of penetration in different scenarios exceeds 70 %, and the penetration effect is improved by more than 10 % in the case of multiple interceptors. This paper also analyzes the impact of interceptor spatial distribution and the quantity of interceptors on the effectiveness of UAV penetration decision-making.

AB - The penetration decision of unmanned aerial vehicles (UAVs) is one of the key components for UAVs to detect or strike crucial targets in adversarial environments. Presently, most methods lack research on penetration decisions under unknown interception information. The executing UAV for decision-making is generally a simplified three-degree-of-freedom model. The decision format is often simplified to path point optimization, lacking control over the stability of complex model attitudes and exhibiting relatively poor robustness. This paper proposes a penetration relative motion theory and aircraft control method based on the six-degrees-of-freedom UAV model to reduce response errors between algorithm control decisions and actual flight control. Using a Markov decision process, a UAV penetration decision control method based on an attitude-overload control loop is designed. This method combines reinforcement learning techniques to enable autonomous penetration decision-making and control for UAVs facing active defense scenarios, enhancing the autonomy, accuracy, and generalization of UAV online decision-making. The paper concludes by implementing autonomous decision-making and control for UAV penetration through the integration of interceptor trajectory prediction with proximal policy optimization. This method reduces the dependency of penetration strategies on interceptor information, ensuring both the generalization ability of penetration strategy models under uncertain information and the consistency of decision stability. Through simulations and experiments, we verified that the penetration strategy based on the pre-sampling PPO method improves the penetration effect by 15.44 % on average and the hit rate by 18.48 %. Moreover, the probability of penetration in different scenarios exceeds 70 %, and the penetration effect is improved by more than 10 % in the case of multiple interceptors. This paper also analyzes the impact of interceptor spatial distribution and the quantity of interceptors on the effectiveness of UAV penetration decision-making.

KW - Guidance control

KW - Maneuvering strategy

KW - Penetration defense

KW - Reinforcement learning

KW - UAV

UR - http://www.scopus.com/inward/record.url?scp=85189663621&partnerID=8YFLogxK

U2 - 10.1016/j.ast.2024.109089

DO - 10.1016/j.ast.2024.109089

M3 - Article

AN - SCOPUS:85189663621

SN - 1270-9638

VL - 148

JO - Aerospace Science and Technology

JF - Aerospace Science and Technology

M1 - 109089

ER -

Optimization of high-speed fixed-wing UAV penetration strategy based on deep reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this