Learning-Based Policy Optimization for Adversarial Missile-Target Assignment

Weilin Luo; Jinhu Lu; Kexin Liu; Lei Chen

doi:10.1109/TSMC.2021.3096997

Learning-Based Policy Optimization for Adversarial Missile-Target Assignment

Weilin Luo, Jinhu Lu^*, Kexin Liu, Lei Chen

^*Corresponding author for this work

Advanced Research Institute of Multidisciplinary Science

Beihang University

Research output: Contribution to journal › Article › peer-review

40 Citations (Scopus)

Abstract

The missile-target assignment (MTA) is a typical weapon-target assignment problem in Command and Control of modern warfare. Despite the significance of the problem, traditional algorithms still lack efficiency, solution quality, and practicability in the adversarial environment. In this article, we propose a data-driven policy optimization with deep reinforcement learning (PODRL) for the adversarial MTA. We design a comprehensive reward function to motivate the optimization of assignment policy. As such, the learned policy can implicitly model the penetration of missiles under an adversarial environment in a data-driven way. We also present a fair sample strategy to improve the sample efficiency and accelerate the policy optimization. Experimental results show that PODRL can adaptively generate satisfactory solutions in both small-scale and large-scale instances. Furthermore, we evaluate the effectiveness of PODRL in a multiobjective scenario. The result demonstrates that a well-optimized policy can achieve high-quality allocation and demand forecast of the missile resources simultaneously.

Original language	English
Pages (from-to)	4426-4437
Number of pages	12
Journal	IEEE Transactions on Systems, Man, and Cybernetics: Systems
Volume	52
Issue number	7
DOIs	https://doi.org/10.1109/TSMC.2021.3096997
Publication status	Published - 1 Jul 2022

Keywords

Adversarial environment
deep Q-learning with fair sample
deep reinforcement learning (DRL)
missile-target assignment (MTA)
policy optimization

Access to Document

10.1109/TSMC.2021.3096997

Cite this

Luo, W., Lu, J., Liu, K., & Chen, L. (2022). Learning-Based Policy Optimization for Adversarial Missile-Target Assignment. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7), 4426-4437. https://doi.org/10.1109/TSMC.2021.3096997

@article{7d9f9b538a9849dfa82b305fe206cf90,

title = "Learning-Based Policy Optimization for Adversarial Missile-Target Assignment",

abstract = "The missile-target assignment (MTA) is a typical weapon-target assignment problem in Command and Control of modern warfare. Despite the significance of the problem, traditional algorithms still lack efficiency, solution quality, and practicability in the adversarial environment. In this article, we propose a data-driven policy optimization with deep reinforcement learning (PODRL) for the adversarial MTA. We design a comprehensive reward function to motivate the optimization of assignment policy. As such, the learned policy can implicitly model the penetration of missiles under an adversarial environment in a data-driven way. We also present a fair sample strategy to improve the sample efficiency and accelerate the policy optimization. Experimental results show that PODRL can adaptively generate satisfactory solutions in both small-scale and large-scale instances. Furthermore, we evaluate the effectiveness of PODRL in a multiobjective scenario. The result demonstrates that a well-optimized policy can achieve high-quality allocation and demand forecast of the missile resources simultaneously.",

keywords = "Adversarial environment, deep Q-learning with fair sample, deep reinforcement learning (DRL), missile-target assignment (MTA), policy optimization",

author = "Weilin Luo and Jinhu Lu and Kexin Liu and Lei Chen",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2022",

month = jul,

day = "1",

doi = "10.1109/TSMC.2021.3096997",

language = "English",

volume = "52",

pages = "4426--4437",

journal = "IEEE Transactions on Systems, Man, and Cybernetics: Systems",

issn = "2168-2216",

publisher = "IEEE Advancing Technology for Humanity",

number = "7",

}

TY - JOUR

T1 - Learning-Based Policy Optimization for Adversarial Missile-Target Assignment

AU - Luo, Weilin

AU - Lu, Jinhu

AU - Liu, Kexin

AU - Chen, Lei

PY - 2022/7/1

Y1 - 2022/7/1

N2 - The missile-target assignment (MTA) is a typical weapon-target assignment problem in Command and Control of modern warfare. Despite the significance of the problem, traditional algorithms still lack efficiency, solution quality, and practicability in the adversarial environment. In this article, we propose a data-driven policy optimization with deep reinforcement learning (PODRL) for the adversarial MTA. We design a comprehensive reward function to motivate the optimization of assignment policy. As such, the learned policy can implicitly model the penetration of missiles under an adversarial environment in a data-driven way. We also present a fair sample strategy to improve the sample efficiency and accelerate the policy optimization. Experimental results show that PODRL can adaptively generate satisfactory solutions in both small-scale and large-scale instances. Furthermore, we evaluate the effectiveness of PODRL in a multiobjective scenario. The result demonstrates that a well-optimized policy can achieve high-quality allocation and demand forecast of the missile resources simultaneously.

AB - The missile-target assignment (MTA) is a typical weapon-target assignment problem in Command and Control of modern warfare. Despite the significance of the problem, traditional algorithms still lack efficiency, solution quality, and practicability in the adversarial environment. In this article, we propose a data-driven policy optimization with deep reinforcement learning (PODRL) for the adversarial MTA. We design a comprehensive reward function to motivate the optimization of assignment policy. As such, the learned policy can implicitly model the penetration of missiles under an adversarial environment in a data-driven way. We also present a fair sample strategy to improve the sample efficiency and accelerate the policy optimization. Experimental results show that PODRL can adaptively generate satisfactory solutions in both small-scale and large-scale instances. Furthermore, we evaluate the effectiveness of PODRL in a multiobjective scenario. The result demonstrates that a well-optimized policy can achieve high-quality allocation and demand forecast of the missile resources simultaneously.

KW - Adversarial environment

KW - deep Q-learning with fair sample

KW - deep reinforcement learning (DRL)

KW - missile-target assignment (MTA)

KW - policy optimization

UR - http://www.scopus.com/inward/record.url?scp=85112603512&partnerID=8YFLogxK

U2 - 10.1109/TSMC.2021.3096997

DO - 10.1109/TSMC.2021.3096997

M3 - Article

AN - SCOPUS:85112603512

SN - 2168-2216

VL - 52

SP - 4426

EP - 4437

JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems

JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems

IS - 7

ER -

Learning-Based Policy Optimization for Adversarial Missile-Target Assignment

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this