TY - JOUR
T1 - Learning-Based Policy Optimization for Adversarial Missile-Target Assignment
AU - Luo, Weilin
AU - Lu, Jinhu
AU - Liu, Kexin
AU - Chen, Lei
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - The missile-target assignment (MTA) is a typical weapon-target assignment problem in Command and Control of modern warfare. Despite the significance of the problem, traditional algorithms still lack efficiency, solution quality, and practicability in the adversarial environment. In this article, we propose a data-driven policy optimization with deep reinforcement learning (PODRL) for the adversarial MTA. We design a comprehensive reward function to motivate the optimization of assignment policy. As such, the learned policy can implicitly model the penetration of missiles under an adversarial environment in a data-driven way. We also present a fair sample strategy to improve the sample efficiency and accelerate the policy optimization. Experimental results show that PODRL can adaptively generate satisfactory solutions in both small-scale and large-scale instances. Furthermore, we evaluate the effectiveness of PODRL in a multiobjective scenario. The result demonstrates that a well-optimized policy can achieve high-quality allocation and demand forecast of the missile resources simultaneously.
AB - The missile-target assignment (MTA) is a typical weapon-target assignment problem in Command and Control of modern warfare. Despite the significance of the problem, traditional algorithms still lack efficiency, solution quality, and practicability in the adversarial environment. In this article, we propose a data-driven policy optimization with deep reinforcement learning (PODRL) for the adversarial MTA. We design a comprehensive reward function to motivate the optimization of assignment policy. As such, the learned policy can implicitly model the penetration of missiles under an adversarial environment in a data-driven way. We also present a fair sample strategy to improve the sample efficiency and accelerate the policy optimization. Experimental results show that PODRL can adaptively generate satisfactory solutions in both small-scale and large-scale instances. Furthermore, we evaluate the effectiveness of PODRL in a multiobjective scenario. The result demonstrates that a well-optimized policy can achieve high-quality allocation and demand forecast of the missile resources simultaneously.
KW - Adversarial environment
KW - deep Q-learning with fair sample
KW - deep reinforcement learning (DRL)
KW - missile-target assignment (MTA)
KW - policy optimization
UR - http://www.scopus.com/inward/record.url?scp=85112603512&partnerID=8YFLogxK
U2 - 10.1109/TSMC.2021.3096997
DO - 10.1109/TSMC.2021.3096997
M3 - Article
AN - SCOPUS:85112603512
SN - 2168-2216
VL - 52
SP - 4426
EP - 4437
JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems
JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems
IS - 7
ER -