基于深度确定性梯度学习的集群多目标分配方法

Qiaoyi Li; Zhengjie Wang; Xiaoning Zhang; Qiyuan Cheng

doi:10.15918/j.tbit1001-0645.2023.200

基于深度确定性梯度学习的集群多目标分配方法

Qiaoyi Li, Zhengjie Wang^*, Xiaoning Zhang, Qiyuan Cheng

^*此作品的通讯作者

机电学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

In the target assignment for multi-missile cooperative operations, there exists uncertainty in the number and variety of enemy platforms and anti-ship missiles, which makes it difficult to model the target assignment algorithm. To improve the effectiveness of attacks under high-dynamic collaborative attack conditions, a dynamic battlefield environment model and a single-round Markov decision model for multi-target assignment were established. An improved deep deterministic policy gradient (DDPG) assignment algorithm was proposed to automatically find the optimal allocation strategy through interaction with the simulator. The algorithm uses the mask method to mask the action space and adapt to the number and type of platforms. The simulation results show that under different defense configurations and configurations of red and blue sides, the performance improvement of the attack strategy obtained by the algorithm was about 87.5% compared with that of the random strategy, and the reasoning time of the model was about 0.04 ms. This research will accelerate the application of DDPG-based methods in intelligent decision-making in high-dynamic environments, and promote the research on cluster autonomous decision-making methods.

投稿的翻译标题	Research on Multi-Target Assignment Method for Clusters Based on Deep Deterministic Policy Gradient Learning
源语言	繁体中文
页（从-至）	1051-1057
页数	7
期刊	Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
卷	44
期	10
DOI	https://doi.org/10.15918/j.tbit1001-0645.2023.200
出版状态	已出版 - 10月 2024

关键词

deep deterministic policy gradient (DDPG)
dynamic environment
Markov decision model
multi-missile cooperation
target assignment

访问文件

10.15918/j.tbit1001-0645.2023.200

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{59b8348b24df41788bdaa50e3f40b930,

title = "基于深度确定性梯度学习的集群多目标分配方法",

abstract = "In the target assignment for multi-missile cooperative operations, there exists uncertainty in the number and variety of enemy platforms and anti-ship missiles, which makes it difficult to model the target assignment algorithm. To improve the effectiveness of attacks under high-dynamic collaborative attack conditions, a dynamic battlefield environment model and a single-round Markov decision model for multi-target assignment were established. An improved deep deterministic policy gradient (DDPG) assignment algorithm was proposed to automatically find the optimal allocation strategy through interaction with the simulator. The algorithm uses the mask method to mask the action space and adapt to the number and type of platforms. The simulation results show that under different defense configurations and configurations of red and blue sides, the performance improvement of the attack strategy obtained by the algorithm was about 87.5% compared with that of the random strategy, and the reasoning time of the model was about 0.04 ms. This research will accelerate the application of DDPG-based methods in intelligent decision-making in high-dynamic environments, and promote the research on cluster autonomous decision-making methods.",

keywords = "deep deterministic policy gradient (DDPG), dynamic environment, Markov decision model, multi-missile cooperation, target assignment",

author = "Qiaoyi Li and Zhengjie Wang and Xiaoning Zhang and Qiyuan Cheng",

year = "2024",

month = oct,

doi = "10.15918/j.tbit1001-0645.2023.200",

language = "繁体中文",

volume = "44",

pages = "1051--1057",

journal = "Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology",

issn = "1001-0645",

publisher = "Beijing Institute of Technology",

number = "10",

}

TY - JOUR

T1 - 基于深度确定性梯度学习的集群多目标分配方法

AU - Li, Qiaoyi

AU - Wang, Zhengjie

AU - Zhang, Xiaoning

AU - Cheng, Qiyuan

PY - 2024/10

Y1 - 2024/10

N2 - In the target assignment for multi-missile cooperative operations, there exists uncertainty in the number and variety of enemy platforms and anti-ship missiles, which makes it difficult to model the target assignment algorithm. To improve the effectiveness of attacks under high-dynamic collaborative attack conditions, a dynamic battlefield environment model and a single-round Markov decision model for multi-target assignment were established. An improved deep deterministic policy gradient (DDPG) assignment algorithm was proposed to automatically find the optimal allocation strategy through interaction with the simulator. The algorithm uses the mask method to mask the action space and adapt to the number and type of platforms. The simulation results show that under different defense configurations and configurations of red and blue sides, the performance improvement of the attack strategy obtained by the algorithm was about 87.5% compared with that of the random strategy, and the reasoning time of the model was about 0.04 ms. This research will accelerate the application of DDPG-based methods in intelligent decision-making in high-dynamic environments, and promote the research on cluster autonomous decision-making methods.

AB - In the target assignment for multi-missile cooperative operations, there exists uncertainty in the number and variety of enemy platforms and anti-ship missiles, which makes it difficult to model the target assignment algorithm. To improve the effectiveness of attacks under high-dynamic collaborative attack conditions, a dynamic battlefield environment model and a single-round Markov decision model for multi-target assignment were established. An improved deep deterministic policy gradient (DDPG) assignment algorithm was proposed to automatically find the optimal allocation strategy through interaction with the simulator. The algorithm uses the mask method to mask the action space and adapt to the number and type of platforms. The simulation results show that under different defense configurations and configurations of red and blue sides, the performance improvement of the attack strategy obtained by the algorithm was about 87.5% compared with that of the random strategy, and the reasoning time of the model was about 0.04 ms. This research will accelerate the application of DDPG-based methods in intelligent decision-making in high-dynamic environments, and promote the research on cluster autonomous decision-making methods.

KW - deep deterministic policy gradient (DDPG)

KW - dynamic environment

KW - Markov decision model

KW - multi-missile cooperation

KW - target assignment

UR - http://www.scopus.com/inward/record.url?scp=85209129997&partnerID=8YFLogxK

U2 - 10.15918/j.tbit1001-0645.2023.200

DO - 10.15918/j.tbit1001-0645.2023.200

M3 - 文章

AN - SCOPUS:85209129997

SN - 1001-0645

VL - 44

SP - 1051

EP - 1057

JO - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology

JF - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology

IS - 10

ER -

基于深度确定性梯度学习的集群多目标分配方法

摘要

关键词

访问文件

其它文件与链接

指纹

引用此