Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation

Xinyu Zhu; Yang Huang; Shaoyu Wang; Qihui Wu; Xiaohu Ge; Yuan Liu; Zhen Gao

doi:10.1109/LWC.2022.3228045

Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation

Xinyu Zhu, Yang Huang^*, Shaoyu Wang, Qihui Wu, Xiaohu Ge, Yuan Liu, Zhen Gao

^*此作品的通讯作者

前沿交叉科学研究院

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

This letter addresses the spectrum anti-jamming problem with multiple Internet of Things (IoT) devices for uplink transmissions, where policies for configuring frequency-domain channels have to be learned without the knowledge of the time-frequency distribution of the interference. The problem of decision-making or learning is expected to be solved by reinforcement learning (RL) approaches. However, the state-of-the-art RL-based spectrum anti-jamming methods may not be applicable in IoT systems, suffer from high computational complexity or may converge to a policy that may not be the best for each user. Therefore, we propose a novel spectrum anti-jamming scheme where configuration policies for the IoT devices are sequentially optimized with value function approximation-based multi-agent RL. Simulation results show that our proposed algorithm outperforms various baselines in terms of average normalized throughput.

源语言	英语
页（从-至）	386-390
页数	5
期刊	IEEE Wireless Communications Letters
卷	12
期	2
DOI	https://doi.org/10.1109/LWC.2022.3228045
出版状态	已出版 - 1 2月 2023

访问文件

10.1109/LWC.2022.3228045

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ec39e23a263d4432bac349457a0b7a7e,

title = "Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation",

abstract = "This letter addresses the spectrum anti-jamming problem with multiple Internet of Things (IoT) devices for uplink transmissions, where policies for configuring frequency-domain channels have to be learned without the knowledge of the time-frequency distribution of the interference. The problem of decision-making or learning is expected to be solved by reinforcement learning (RL) approaches. However, the state-of-the-art RL-based spectrum anti-jamming methods may not be applicable in IoT systems, suffer from high computational complexity or may converge to a policy that may not be the best for each user. Therefore, we propose a novel spectrum anti-jamming scheme where configuration policies for the IoT devices are sequentially optimized with value function approximation-based multi-agent RL. Simulation results show that our proposed algorithm outperforms various baselines in terms of average normalized throughput.",

keywords = "Internet of Things, Markov decision process, Uplink transmissions, anti-jamming, reinforcement learning",

author = "Xinyu Zhu and Yang Huang and Shaoyu Wang and Qihui Wu and Xiaohu Ge and Yuan Liu and Zhen Gao",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2023",

month = feb,

day = "1",

doi = "10.1109/LWC.2022.3228045",

language = "English",

volume = "12",

pages = "386--390",

journal = "IEEE Wireless Communications Letters",

issn = "2162-2337",

publisher = "IEEE Communications Society",

number = "2",

}

TY - JOUR

T1 - Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation

AU - Zhu, Xinyu

AU - Huang, Yang

AU - Wang, Shaoyu

AU - Wu, Qihui

AU - Ge, Xiaohu

AU - Liu, Yuan

AU - Gao, Zhen

PY - 2023/2/1

Y1 - 2023/2/1

N2 - This letter addresses the spectrum anti-jamming problem with multiple Internet of Things (IoT) devices for uplink transmissions, where policies for configuring frequency-domain channels have to be learned without the knowledge of the time-frequency distribution of the interference. The problem of decision-making or learning is expected to be solved by reinforcement learning (RL) approaches. However, the state-of-the-art RL-based spectrum anti-jamming methods may not be applicable in IoT systems, suffer from high computational complexity or may converge to a policy that may not be the best for each user. Therefore, we propose a novel spectrum anti-jamming scheme where configuration policies for the IoT devices are sequentially optimized with value function approximation-based multi-agent RL. Simulation results show that our proposed algorithm outperforms various baselines in terms of average normalized throughput.

AB - This letter addresses the spectrum anti-jamming problem with multiple Internet of Things (IoT) devices for uplink transmissions, where policies for configuring frequency-domain channels have to be learned without the knowledge of the time-frequency distribution of the interference. The problem of decision-making or learning is expected to be solved by reinforcement learning (RL) approaches. However, the state-of-the-art RL-based spectrum anti-jamming methods may not be applicable in IoT systems, suffer from high computational complexity or may converge to a policy that may not be the best for each user. Therefore, we propose a novel spectrum anti-jamming scheme where configuration policies for the IoT devices are sequentially optimized with value function approximation-based multi-agent RL. Simulation results show that our proposed algorithm outperforms various baselines in terms of average normalized throughput.

KW - Internet of Things

KW - Markov decision process

KW - Uplink transmissions

KW - anti-jamming

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85144796918&partnerID=8YFLogxK

U2 - 10.1109/LWC.2022.3228045

DO - 10.1109/LWC.2022.3228045

M3 - Article

AN - SCOPUS:85144796918

SN - 2162-2337

VL - 12

SP - 386

EP - 390

JO - IEEE Wireless Communications Letters

JF - IEEE Wireless Communications Letters

IS - 2

ER -

Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation

摘要

访问文件

其它文件与链接

指纹

引用此