一种深度强化学习与模仿学习结合的突防策略

Xiaofang Wang; Kunren Gu

doi:10.3873/j.issn.1000-1328.2023.06.011

一种深度强化学习与模仿学习结合的突防策略

Translated title of the contribution: A Penetration Strategy Combining Deep Reinforcement Learning and Imitation Learning

Xiaofang Wang, Kunren Gu

School of Aerospace Engineering

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

Abstract

Considering the requirements for penetration and strike after penetration when the fighter encounters the interceptor in the process of attacking the target, an intelligent maneuver penetration for fighter algorithm based on deep reinforcement learning and imitation learning theory is proposed. Firstly, the maneuver penetration of fighter is transformed into a Markov decision process, and a reward function is designed that comprehensively takes into account both penetration and attack by considering the distance between the fighter and the defense missile, the distance between the fighter and the target after penetration, and the velocity deflection angle of the fighter relative to fighter-target line of sight. Then combining Proximal Policy Optimization ( PPO) algorithm and imitation learning theory, the Generative antagonistic imitation learning-proximal policy optimization (GAIL-PPO ) intelligent penetration network is constructed, which is composed of Discrimination network, Actor network and Critic network. Finally, the intelligent penetration network is trained with expert strategy. The simulation results show that the GAIL-PPO penetration strategy can quickly converge by learning the experience of expert strategies in the early stage, and can fully explore in the complex environment in the later stage, obtaining better performance than the expert strategies.

Translated title of the contribution	A Penetration Strategy Combining Deep Reinforcement Learning and Imitation Learning
Original language	Chinese (Traditional)
Pages (from-to)	914-925
Number of pages	12
Journal	Yuhang Xuebao/Journal of Astronautics
Volume	44
Issue number	6
DOIs	https://doi.org/10.3873/j.issn.1000-1328.2023.06.011
Publication status	Published - Jun 2023

Access to Document

10.3873/j.issn.1000-1328.2023.06.011

Cite this

@article{b1f31468d0f04365974858b8e3ef316c,

title = "一种深度强化学习与模仿学习结合的突防策略",

abstract = "Considering the requirements for penetration and strike after penetration when the fighter encounters the interceptor in the process of attacking the target, an intelligent maneuver penetration for fighter algorithm based on deep reinforcement learning and imitation learning theory is proposed. Firstly, the maneuver penetration of fighter is transformed into a Markov decision process, and a reward function is designed that comprehensively takes into account both penetration and attack by considering the distance between the fighter and the defense missile, the distance between the fighter and the target after penetration, and the velocity deflection angle of the fighter relative to fighter-target line of sight. Then combining Proximal Policy Optimization ( PPO) algorithm and imitation learning theory, the Generative antagonistic imitation learning-proximal policy optimization (GAIL-PPO ) intelligent penetration network is constructed, which is composed of Discrimination network, Actor network and Critic network. Finally, the intelligent penetration network is trained with expert strategy. The simulation results show that the GAIL-PPO penetration strategy can quickly converge by learning the experience of expert strategies in the early stage, and can fully explore in the complex environment in the later stage, obtaining better performance than the expert strategies.",

keywords = "Deep reinforcement learning, Fighter Aircraft, Imitative learning, Intelligent Penetration, Maneuver Penetration",

author = "Xiaofang Wang and Kunren Gu",

year = "2023",

month = jun,

doi = "10.3873/j.issn.1000-1328.2023.06.011",

language = "繁体中文",

volume = "44",

pages = "914--925",

journal = "Yuhang Xuebao/Journal of Astronautics",

issn = "1000-1328",

publisher = "Chinese Society of Astronautics",

number = "6",

}

TY - JOUR

T1 - 一种深度强化学习与模仿学习结合的突防策略

AU - Wang, Xiaofang

AU - Gu, Kunren

PY - 2023/6

Y1 - 2023/6

N2 - Considering the requirements for penetration and strike after penetration when the fighter encounters the interceptor in the process of attacking the target, an intelligent maneuver penetration for fighter algorithm based on deep reinforcement learning and imitation learning theory is proposed. Firstly, the maneuver penetration of fighter is transformed into a Markov decision process, and a reward function is designed that comprehensively takes into account both penetration and attack by considering the distance between the fighter and the defense missile, the distance between the fighter and the target after penetration, and the velocity deflection angle of the fighter relative to fighter-target line of sight. Then combining Proximal Policy Optimization ( PPO) algorithm and imitation learning theory, the Generative antagonistic imitation learning-proximal policy optimization (GAIL-PPO ) intelligent penetration network is constructed, which is composed of Discrimination network, Actor network and Critic network. Finally, the intelligent penetration network is trained with expert strategy. The simulation results show that the GAIL-PPO penetration strategy can quickly converge by learning the experience of expert strategies in the early stage, and can fully explore in the complex environment in the later stage, obtaining better performance than the expert strategies.

AB - Considering the requirements for penetration and strike after penetration when the fighter encounters the interceptor in the process of attacking the target, an intelligent maneuver penetration for fighter algorithm based on deep reinforcement learning and imitation learning theory is proposed. Firstly, the maneuver penetration of fighter is transformed into a Markov decision process, and a reward function is designed that comprehensively takes into account both penetration and attack by considering the distance between the fighter and the defense missile, the distance between the fighter and the target after penetration, and the velocity deflection angle of the fighter relative to fighter-target line of sight. Then combining Proximal Policy Optimization ( PPO) algorithm and imitation learning theory, the Generative antagonistic imitation learning-proximal policy optimization (GAIL-PPO ) intelligent penetration network is constructed, which is composed of Discrimination network, Actor network and Critic network. Finally, the intelligent penetration network is trained with expert strategy. The simulation results show that the GAIL-PPO penetration strategy can quickly converge by learning the experience of expert strategies in the early stage, and can fully explore in the complex environment in the later stage, obtaining better performance than the expert strategies.

KW - Deep reinforcement learning

KW - Fighter Aircraft

KW - Imitative learning

KW - Intelligent Penetration

KW - Maneuver Penetration

UR - http://www.scopus.com/inward/record.url?scp=85171452086&partnerID=8YFLogxK

U2 - 10.3873/j.issn.1000-1328.2023.06.011

DO - 10.3873/j.issn.1000-1328.2023.06.011

M3 - 文章

AN - SCOPUS:85171452086

SN - 1000-1328

VL - 44

SP - 914

EP - 925

JO - Yuhang Xuebao/Journal of Astronautics

JF - Yuhang Xuebao/Journal of Astronautics

IS - 6

ER -

一种深度强化学习与模仿学习结合的突防策略

Abstract

Access to Document

Other files and links

Fingerprint

Cite this