TY - JOUR
T1 - Deep Reinforcement Learning Based Resource Allocation and Trajectory Planning in Integrated Sensing and Communications UAV Network
AU - Qin, Yunhui
AU - Zhang, Zhongshan
AU - Li, Xulong
AU - Huangfu, Wei
AU - Zhang, Haijun
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - In this paper, multi-UAVs serve as mobile aerial ISAC platforms to sense and communicate with on-ground target users. To optimize the communication and sensing performance, we formulate a joint user association, UAV trajectory planning and power allocation problem to maximize the minimum weighted spectral efficiency among UAVs. This paper exploits the centralized and the decentralized deep reinforcement learning (DRL) solutions to solve the sequential decision-making problem. On one hand, we first introduce the centralized soft actor-critic (SAC) algorithm. Then, we explore the equivalent transformation of the optimization objective based on symmetric group, propose the random and the adaptive data augmentation schemes to design the replay memory buffer of SAC, and accordingly propose SAC algorithms assisted by data augmentation to tackle the transformed problem. On the other hand, the multi-agent soft actor-critic (MASAC), a decentralized solution, is also introduced to solve this sequential decision-making problem. The experiment results reveal the effectiveness of the centralized and the decentralized solutions in considered scenarios. Specifically, the SAC assisted by the adaptive scheme significantly outperforms other centralized solutions in the training speed and the weighted spectral efficiency. Meanwhile, the decentralized MASAC algorithm behaves best in the early training speed.
AB - In this paper, multi-UAVs serve as mobile aerial ISAC platforms to sense and communicate with on-ground target users. To optimize the communication and sensing performance, we formulate a joint user association, UAV trajectory planning and power allocation problem to maximize the minimum weighted spectral efficiency among UAVs. This paper exploits the centralized and the decentralized deep reinforcement learning (DRL) solutions to solve the sequential decision-making problem. On one hand, we first introduce the centralized soft actor-critic (SAC) algorithm. Then, we explore the equivalent transformation of the optimization objective based on symmetric group, propose the random and the adaptive data augmentation schemes to design the replay memory buffer of SAC, and accordingly propose SAC algorithms assisted by data augmentation to tackle the transformed problem. On the other hand, the multi-agent soft actor-critic (MASAC), a decentralized solution, is also introduced to solve this sequential decision-making problem. The experiment results reveal the effectiveness of the centralized and the decentralized solutions in considered scenarios. Specifically, the SAC assisted by the adaptive scheme significantly outperforms other centralized solutions in the training speed and the weighted spectral efficiency. Meanwhile, the decentralized MASAC algorithm behaves best in the early training speed.
KW - Integrated sensing and communications (ISAC)
KW - deep reinforcement learning
KW - power allocation
KW - trajectory planning
KW - unmanned aerial vehicle (UAV)
UR - http://www.scopus.com/inward/record.url?scp=85151534164&partnerID=8YFLogxK
U2 - 10.1109/TWC.2023.3260304
DO - 10.1109/TWC.2023.3260304
M3 - Article
AN - SCOPUS:85151534164
SN - 1536-1276
VL - 22
SP - 8158
EP - 8169
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 11
ER -