Memory-based deep reinforcement learning for cognitive radar target tracking waveform resource management

Jiahao Qin; Mengtao Zhu; Zesi Pan; Yunjie Li; Yan Li

doi:10.1049/rsn2.12469

Memory-based deep reinforcement learning for cognitive radar target tracking waveform resource management

Jiahao Qin, Mengtao Zhu, Zesi Pan, Yunjie Li, Yan Li^*

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

A cognitive radar (CR) system can offer enhanced target tracking performance due to its intelligence on the perception-action cycle, wherein a CR adaptively allocates the limited transmitting resources based on its perception of surrounding environments. To effectively manage the transmit waveform resource for the target tracking task, CR resource management problem is formulated under the partially observable Markov decision process framework. The sequential decision-making and the inherent partial observability for target tracking problem are considered. In the proposed method, a long short-term memory (LSTM)-based twin delayed deep deterministic policy gradient (TD3) algorithm is developed to effectively solve the problem. A reward function is designed considering Haykin's cognitive executive attention mechanism for radar systems such that the CR resource management policy has stability in the decision of transmit waveform, which follows the principle of minimum disturbance. Simulation results demonstrate the superiority of the proposed LSTM memory-based TD3 with improved target tracking performance and increased mean rewards for CR.

源语言	英语
页（从-至）	1822-1836
页数	15
期刊	IET Radar, Sonar and Navigation
卷	17
期	12
DOI	https://doi.org/10.1049/rsn2.12469
出版状态	已出版 - 12月 2023

访问文件

10.1049/rsn2.12469

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7532502deea9499099b3a3a9842b70b7,

title = "Memory-based deep reinforcement learning for cognitive radar target tracking waveform resource management",

abstract = "A cognitive radar (CR) system can offer enhanced target tracking performance due to its intelligence on the perception-action cycle, wherein a CR adaptively allocates the limited transmitting resources based on its perception of surrounding environments. To effectively manage the transmit waveform resource for the target tracking task, CR resource management problem is formulated under the partially observable Markov decision process framework. The sequential decision-making and the inherent partial observability for target tracking problem are considered. In the proposed method, a long short-term memory (LSTM)-based twin delayed deep deterministic policy gradient (TD3) algorithm is developed to effectively solve the problem. A reward function is designed considering Haykin's cognitive executive attention mechanism for radar systems such that the CR resource management policy has stability in the decision of transmit waveform, which follows the principle of minimum disturbance. Simulation results demonstrate the superiority of the proposed LSTM memory-based TD3 with improved target tracking performance and increased mean rewards for CR.",

keywords = "adaptive radar, decision making, intelligent networks",

author = "Jiahao Qin and Mengtao Zhu and Zesi Pan and Yunjie Li and Yan Li",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors. IET Radar, Sonar & Navigation published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.",

year = "2023",

month = dec,

doi = "10.1049/rsn2.12469",

language = "English",

volume = "17",

pages = "1822--1836",

journal = "IET Radar, Sonar and Navigation",

issn = "1751-8784",

publisher = "John Wiley & Sons Inc.",

number = "12",

}

TY - JOUR

T1 - Memory-based deep reinforcement learning for cognitive radar target tracking waveform resource management

AU - Qin, Jiahao

AU - Zhu, Mengtao

AU - Pan, Zesi

AU - Li, Yunjie

AU - Li, Yan

PY - 2023/12

Y1 - 2023/12

N2 - A cognitive radar (CR) system can offer enhanced target tracking performance due to its intelligence on the perception-action cycle, wherein a CR adaptively allocates the limited transmitting resources based on its perception of surrounding environments. To effectively manage the transmit waveform resource for the target tracking task, CR resource management problem is formulated under the partially observable Markov decision process framework. The sequential decision-making and the inherent partial observability for target tracking problem are considered. In the proposed method, a long short-term memory (LSTM)-based twin delayed deep deterministic policy gradient (TD3) algorithm is developed to effectively solve the problem. A reward function is designed considering Haykin's cognitive executive attention mechanism for radar systems such that the CR resource management policy has stability in the decision of transmit waveform, which follows the principle of minimum disturbance. Simulation results demonstrate the superiority of the proposed LSTM memory-based TD3 with improved target tracking performance and increased mean rewards for CR.

AB - A cognitive radar (CR) system can offer enhanced target tracking performance due to its intelligence on the perception-action cycle, wherein a CR adaptively allocates the limited transmitting resources based on its perception of surrounding environments. To effectively manage the transmit waveform resource for the target tracking task, CR resource management problem is formulated under the partially observable Markov decision process framework. The sequential decision-making and the inherent partial observability for target tracking problem are considered. In the proposed method, a long short-term memory (LSTM)-based twin delayed deep deterministic policy gradient (TD3) algorithm is developed to effectively solve the problem. A reward function is designed considering Haykin's cognitive executive attention mechanism for radar systems such that the CR resource management policy has stability in the decision of transmit waveform, which follows the principle of minimum disturbance. Simulation results demonstrate the superiority of the proposed LSTM memory-based TD3 with improved target tracking performance and increased mean rewards for CR.

KW - adaptive radar

KW - decision making

KW - intelligent networks

UR - http://www.scopus.com/inward/record.url?scp=85170535805&partnerID=8YFLogxK

U2 - 10.1049/rsn2.12469

DO - 10.1049/rsn2.12469

M3 - Article

AN - SCOPUS:85170535805

SN - 1751-8784

VL - 17

SP - 1822

EP - 1836

JO - IET Radar, Sonar and Navigation

JF - IET Radar, Sonar and Navigation

IS - 12

ER -

Memory-based deep reinforcement learning for cognitive radar target tracking waveform resource management

摘要

访问文件

其它文件与链接

指纹

引用此