一种深度强化学习制导控制一体化算法

Pei Pei; Shao Ming He; Jiang Wang; De Fu Lin

doi:10.3873/j.issn.1000-1328.2021.10.010

一种深度强化学习制导控制一体化算法

Pei Pei, Shao Ming He^*, Jiang Wang, De Fu Lin

^*此作品的通讯作者

宇航学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

27 引用（Scopus）

摘要

This paper proposes an integrated guidance and control algorithm based on deep reinforcement learning technique. Differently from the traditional integrated guidance and control algorithm and designing the guidance loop and control loop separately, the fin deflection command of proposed integrated guidance and control algorithm is given by the agent through the observation states of missile. The agent is generated by the deep reinforcement learning. To utilize the deep reinforcement learning technique in integrated guidance and control problem, we transfer the integrated guidance and control problem into a Markovian decision process that enables the application of reinforcement learning theory. A heuristic way is utilized to shape a proper reward function that has tradeoff between guidance accuracy, energy consumption and interception time. The state-of-the-art deep deterministic policy gradient algorithm is utilized to learn an action policy that maps the observation states to a fin deflection command. Extensive empirical numerical simulations are performed to validate the effectiveness and robustness of proposed integrated guidance and control algorithm.

投稿的翻译标题	Integrated Guidance and Control for Missile Using Deep Reinforcement Learning
源语言	繁体中文
页（从-至）	1293-1304
页数	12
期刊	Yuhang Xuebao/Journal of Astronautics
卷	42
期	10
DOI	https://doi.org/10.3873/j.issn.1000-1328.2021.10.010
出版状态	已出版 - 30 10月 2021

关键词

Deep deterministic policy gradient
Deep reinforcement learning
Heuristic learning
Integrated guidance and control
Zero-effort-miss

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.3873/j.issn.1000-1328.2021.10.010

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b46dd014dd044b0db37ceae920f7f6e7,

title = "一种深度强化学习制导控制一体化算法",

abstract = "This paper proposes an integrated guidance and control algorithm based on deep reinforcement learning technique. Differently from the traditional integrated guidance and control algorithm and designing the guidance loop and control loop separately, the fin deflection command of proposed integrated guidance and control algorithm is given by the agent through the observation states of missile. The agent is generated by the deep reinforcement learning. To utilize the deep reinforcement learning technique in integrated guidance and control problem, we transfer the integrated guidance and control problem into a Markovian decision process that enables the application of reinforcement learning theory. A heuristic way is utilized to shape a proper reward function that has tradeoff between guidance accuracy, energy consumption and interception time. The state-of-the-art deep deterministic policy gradient algorithm is utilized to learn an action policy that maps the observation states to a fin deflection command. Extensive empirical numerical simulations are performed to validate the effectiveness and robustness of proposed integrated guidance and control algorithm.",

keywords = "Deep deterministic policy gradient, Deep reinforcement learning, Heuristic learning, Integrated guidance and control, Zero-effort-miss",

author = "Pei Pei and He, {Shao Ming} and Jiang Wang and Lin, {De Fu}",

year = "2021",

month = oct,

day = "30",

doi = "10.3873/j.issn.1000-1328.2021.10.010",

language = "繁体中文",

volume = "42",

pages = "1293--1304",

journal = "Yuhang Xuebao/Journal of Astronautics",

issn = "1000-1328",

publisher = "Chinese Society of Astronautics",

number = "10",

}

TY - JOUR

T1 - 一种深度强化学习制导控制一体化算法

AU - Pei, Pei

AU - He, Shao Ming

AU - Wang, Jiang

AU - Lin, De Fu

PY - 2021/10/30

Y1 - 2021/10/30

N2 - This paper proposes an integrated guidance and control algorithm based on deep reinforcement learning technique. Differently from the traditional integrated guidance and control algorithm and designing the guidance loop and control loop separately, the fin deflection command of proposed integrated guidance and control algorithm is given by the agent through the observation states of missile. The agent is generated by the deep reinforcement learning. To utilize the deep reinforcement learning technique in integrated guidance and control problem, we transfer the integrated guidance and control problem into a Markovian decision process that enables the application of reinforcement learning theory. A heuristic way is utilized to shape a proper reward function that has tradeoff between guidance accuracy, energy consumption and interception time. The state-of-the-art deep deterministic policy gradient algorithm is utilized to learn an action policy that maps the observation states to a fin deflection command. Extensive empirical numerical simulations are performed to validate the effectiveness and robustness of proposed integrated guidance and control algorithm.

AB - This paper proposes an integrated guidance and control algorithm based on deep reinforcement learning technique. Differently from the traditional integrated guidance and control algorithm and designing the guidance loop and control loop separately, the fin deflection command of proposed integrated guidance and control algorithm is given by the agent through the observation states of missile. The agent is generated by the deep reinforcement learning. To utilize the deep reinforcement learning technique in integrated guidance and control problem, we transfer the integrated guidance and control problem into a Markovian decision process that enables the application of reinforcement learning theory. A heuristic way is utilized to shape a proper reward function that has tradeoff between guidance accuracy, energy consumption and interception time. The state-of-the-art deep deterministic policy gradient algorithm is utilized to learn an action policy that maps the observation states to a fin deflection command. Extensive empirical numerical simulations are performed to validate the effectiveness and robustness of proposed integrated guidance and control algorithm.

KW - Deep deterministic policy gradient

KW - Deep reinforcement learning

KW - Heuristic learning

KW - Integrated guidance and control

KW - Zero-effort-miss

UR - http://www.scopus.com/inward/record.url?scp=85120671453&partnerID=8YFLogxK

U2 - 10.3873/j.issn.1000-1328.2021.10.010

DO - 10.3873/j.issn.1000-1328.2021.10.010

M3 - 文章

AN - SCOPUS:85120671453

SN - 1000-1328

VL - 42

SP - 1293

EP - 1304

JO - Yuhang Xuebao/Journal of Astronautics

JF - Yuhang Xuebao/Journal of Astronautics

IS - 10

ER -

一种深度强化学习制导控制一体化算法

摘要

关键词

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此