TY - JOUR
T1 - Optimizing resource allocation in UAV-assisted ultra-dense networks for enhanced performance and security
AU - Ye, Pei Gen
AU - Zheng, Jun
AU - Ren, Xiaojun
AU - Huang, Jinbin
AU - Zhang, Zhenxin
AU - Pang, Yan
AU - Kou, Guang
N1 - Publisher Copyright:
© 2024
PY - 2024/9
Y1 - 2024/9
N2 - The deployment of unmanned aerial vehicles (UAVs) in ultra-dense networks (UNDs) has significantly advanced network capabilities in 5G/6G environments, addressing coverage enhancement and security concerns. Our research presents a deep reinforcement learning (DRL) based approach designed to manage the increasing data traffic demands and limited communication resources in UAV-assisted UNDs. Traditional DRL methodologies often struggle with challenges like low sample efficiency and energy wastage, which can indirectly impact network security and stability. To address these concerns, we introduce the Stabilizing Transformers based Potential Driven Reinforcement Learning (STPD-RL) framework. STPD-RL optimizes critical network operations such as transmission link selection and power allocation, directly contributing to improved energy efficiency and robust network performance. Initially, we have refined the potential driven experience replay and implemented it into resource allocation in UAV-assisted UDN for the inaugural time. By assigning a potential energy function to each state in experience replay, users can employ intrinsic state supervision to learn from a spectrum of good and bad experiences. Subsequently, we have employed stabilizing transformers to hasten the learning trajectory for resource allocation policies, thereby enhancing the stability of model training. Furthermore, we have integrated potential driven experience replay and stabilizing transformers within the Proximal Policy Optimization algorithm, thus formulating our uniquely tailored STPD-PPO. In simulations with many users and base stations, STPD-PPO outperformed traditional PPO in metrics such as entropy loss, policy loss, and value loss. Results suggest that our STPD-PPO surpasses traditional DRL algorithms in several respects, including convergence rate, energy efficiency, total power consumption, and exploration capacity.
AB - The deployment of unmanned aerial vehicles (UAVs) in ultra-dense networks (UNDs) has significantly advanced network capabilities in 5G/6G environments, addressing coverage enhancement and security concerns. Our research presents a deep reinforcement learning (DRL) based approach designed to manage the increasing data traffic demands and limited communication resources in UAV-assisted UNDs. Traditional DRL methodologies often struggle with challenges like low sample efficiency and energy wastage, which can indirectly impact network security and stability. To address these concerns, we introduce the Stabilizing Transformers based Potential Driven Reinforcement Learning (STPD-RL) framework. STPD-RL optimizes critical network operations such as transmission link selection and power allocation, directly contributing to improved energy efficiency and robust network performance. Initially, we have refined the potential driven experience replay and implemented it into resource allocation in UAV-assisted UDN for the inaugural time. By assigning a potential energy function to each state in experience replay, users can employ intrinsic state supervision to learn from a spectrum of good and bad experiences. Subsequently, we have employed stabilizing transformers to hasten the learning trajectory for resource allocation policies, thereby enhancing the stability of model training. Furthermore, we have integrated potential driven experience replay and stabilizing transformers within the Proximal Policy Optimization algorithm, thus formulating our uniquely tailored STPD-PPO. In simulations with many users and base stations, STPD-PPO outperformed traditional PPO in metrics such as entropy loss, policy loss, and value loss. Results suggest that our STPD-PPO surpasses traditional DRL algorithms in several respects, including convergence rate, energy efficiency, total power consumption, and exploration capacity.
KW - Deep reinforcement learning
KW - Experience replay
KW - Resource allocation
KW - Transformers
KW - Ultra-dense network
UR - http://www.scopus.com/inward/record.url?scp=85197560359&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2024.120788
DO - 10.1016/j.ins.2024.120788
M3 - Article
AN - SCOPUS:85197560359
SN - 0020-0255
VL - 679
JO - Information Sciences
JF - Information Sciences
M1 - 120788
ER -