Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning

Bo Li; Jingyi Huang; Shuangxia Bai; Zhigang Gan; Shiyang Liang; Neretin Evgeny; Shouwen Yao

doi:10.1049/cit2.12109

Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning

Bo Li, Jingyi Huang, Shuangxia Bai, Zhigang Gan, Shiyang Liang, Neretin Evgeny, Shouwen Yao^*

^*Corresponding author for this work

School of Mechanical Engineering

Research output: Contribution to journal › Article › peer-review

26 Citations (Scopus)

Abstract

Aiming at addressing the problem of manoeuvring decision-making in UAV air combat, this study establishes a one-to-one air combat model, defines missile attack areas, and uses the non-deterministic policy Soft-Actor-Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm (PSP-SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.

Original language	English
Pages (from-to)	64-81
Number of pages	18
Journal	CAAI Transactions on Intelligence Technology
Volume	8
Issue number	1
DOIs	https://doi.org/10.1049/cit2.12109
Publication status	Published - Mar 2023

Keywords

SAC algorithm
UAV
air combat decision
deep reinforcement learning
parallel self-play

Access to Document

10.1049/cit2.12109

Cite this

Li, B., Huang, J., Bai, S., Gan, Z., Liang, S., Evgeny, N., & Yao, S. (2023). Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Transactions on Intelligence Technology, 8(1), 64-81. https://doi.org/10.1049/cit2.12109

@article{801495b6ad5746949be5d4717f2a497c,

title = "Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning",

abstract = "Aiming at addressing the problem of manoeuvring decision-making in UAV air combat, this study establishes a one-to-one air combat model, defines missile attack areas, and uses the non-deterministic policy Soft-Actor-Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm (PSP-SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.",

keywords = "SAC algorithm, UAV, air combat decision, deep reinforcement learning, parallel self-play",

author = "Bo Li and Jingyi Huang and Shuangxia Bai and Zhigang Gan and Shiyang Liang and Neretin Evgeny and Shouwen Yao",

note = "Publisher Copyright: {\textcopyright} 2022 The Authors. CAAI Transactions on Intelligence Technology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology.",

year = "2023",

month = mar,

doi = "10.1049/cit2.12109",

language = "English",

volume = "8",

pages = "64--81",

journal = "CAAI Transactions on Intelligence Technology",

issn = "2468-6557",

publisher = "John Wiley & Sons Inc.",

number = "1",

}

TY - JOUR

T1 - Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning

AU - Li, Bo

AU - Huang, Jingyi

AU - Bai, Shuangxia

AU - Gan, Zhigang

AU - Liang, Shiyang

AU - Evgeny, Neretin

AU - Yao, Shouwen

N1 - Publisher Copyright: © 2022 The Authors. CAAI Transactions on Intelligence Technology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology.

PY - 2023/3

Y1 - 2023/3

N2 - Aiming at addressing the problem of manoeuvring decision-making in UAV air combat, this study establishes a one-to-one air combat model, defines missile attack areas, and uses the non-deterministic policy Soft-Actor-Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm (PSP-SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.

AB - Aiming at addressing the problem of manoeuvring decision-making in UAV air combat, this study establishes a one-to-one air combat model, defines missile attack areas, and uses the non-deterministic policy Soft-Actor-Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm (PSP-SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.

KW - SAC algorithm

KW - UAV

KW - air combat decision

KW - deep reinforcement learning

KW - parallel self-play

UR - http://www.scopus.com/inward/record.url?scp=85131767917&partnerID=8YFLogxK

U2 - 10.1049/cit2.12109

DO - 10.1049/cit2.12109

M3 - Article

AN - SCOPUS:85131767917

SN - 2468-6557

VL - 8

SP - 64

EP - 81

JO - CAAI Transactions on Intelligence Technology

JF - CAAI Transactions on Intelligence Technology

IS - 1

ER -

Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this