TY - JOUR
T1 - A Reinforcement Learning Method Based on an Improved Sampling Mechanism for Unmanned Aerial Vehicle Penetration
AU - Wang, Yue
AU - Li, Kexv
AU - Zhuang, Xing
AU - Liu, Xinyu
AU - Li, Hanyu
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/7
Y1 - 2023/7
N2 - The penetration of unmanned aerial vehicles (UAVs) is an important aspect of UAV games. In recent years, UAV penetration has generally been solved using artificial intelligence methods such as reinforcement learning. However, the high sample demand of the reinforcement learning method poses a significant challenge specifically in the context of UAV games. To improve the sample utilization in UAV penetration, this paper innovatively proposes an improved sampling mechanism called task completion division (TCD) and combines this method with the soft actor critic (SAC) algorithm to form the TCD-SAC algorithm. To compare the performance of the TCD-SAC algorithm with other related baseline algorithms, this study builds a dynamic environment, a UAV game, and conducts training and testing experiments in this environment. The results show that among all the algorithms, the TCD-SAC algorithm has the highest sample utilization rate and the best actual penetration results, and the algorithm has a good adaptability and robustness in dynamic environments.
AB - The penetration of unmanned aerial vehicles (UAVs) is an important aspect of UAV games. In recent years, UAV penetration has generally been solved using artificial intelligence methods such as reinforcement learning. However, the high sample demand of the reinforcement learning method poses a significant challenge specifically in the context of UAV games. To improve the sample utilization in UAV penetration, this paper innovatively proposes an improved sampling mechanism called task completion division (TCD) and combines this method with the soft actor critic (SAC) algorithm to form the TCD-SAC algorithm. To compare the performance of the TCD-SAC algorithm with other related baseline algorithms, this study builds a dynamic environment, a UAV game, and conducts training and testing experiments in this environment. The results show that among all the algorithms, the TCD-SAC algorithm has the highest sample utilization rate and the best actual penetration results, and the algorithm has a good adaptability and robustness in dynamic environments.
KW - UAV penetration
KW - reinforcement learning
KW - sample utilization
KW - task completion division
UR - http://www.scopus.com/inward/record.url?scp=85165939682&partnerID=8YFLogxK
U2 - 10.3390/aerospace10070642
DO - 10.3390/aerospace10070642
M3 - Article
AN - SCOPUS:85165939682
SN - 2226-4310
VL - 10
JO - Aerospace
JF - Aerospace
IS - 7
M1 - 642
ER -