TY - JOUR
T1 - Primal-Dual Deep Reinforcement Learning for Periodic Coverage-Assisted UAV Secure Communications
AU - Qin, Yunhui
AU - Xing, Zhifang
AU - Li, Xulong
AU - Zhang, Zhongshan
AU - Zhang, Haijun
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - Considering the UAVs' energy constraints and green communication requirements, this paper proposes a periodic coverage-assisted UAV secure communication system to maximize the worst-case average achievable secrecy rate.UAV base stations serve legitimate users while UAV jammers periodically dispatch interference signals to eavesdroppers. User scheduling, UAV trajectory and power allocation are modeled as a constrained Markov decision problem with coverage evaluation constraint. Then, the joint optimization of user scheduling, UAV trajectory and power allocation is achieved by the primal-dual soft actor-critic (SAC) algorithm. Specifically, the reward critic network assesses the secrecy rate and the cost critic network fits the coverage constraint. Meanwhile, the actor network generates the user scheduling, UAV trajectory and power allocation policy while updating the dual variables. For comparison, we also adopt other deep reinforcement learning (DRL) solutions namely the SAC algorithm and the twin-delayed deep deterministic policy gradient (TD3) as well as the traditional random method and greedy method. Simulation results show that the proposed algorithm performs best in the training speed, the reward performance and the secrecy rate.
AB - Considering the UAVs' energy constraints and green communication requirements, this paper proposes a periodic coverage-assisted UAV secure communication system to maximize the worst-case average achievable secrecy rate.UAV base stations serve legitimate users while UAV jammers periodically dispatch interference signals to eavesdroppers. User scheduling, UAV trajectory and power allocation are modeled as a constrained Markov decision problem with coverage evaluation constraint. Then, the joint optimization of user scheduling, UAV trajectory and power allocation is achieved by the primal-dual soft actor-critic (SAC) algorithm. Specifically, the reward critic network assesses the secrecy rate and the cost critic network fits the coverage constraint. Meanwhile, the actor network generates the user scheduling, UAV trajectory and power allocation policy while updating the dual variables. For comparison, we also adopt other deep reinforcement learning (DRL) solutions namely the SAC algorithm and the twin-delayed deep deterministic policy gradient (TD3) as well as the traditional random method and greedy method. Simulation results show that the proposed algorithm performs best in the training speed, the reward performance and the secrecy rate.
KW - Autonomous aerial vehicles
KW - Communication system security
KW - constrained Markov decision process
KW - deep reinforcement learning
KW - Jamming
KW - Optimization
KW - periodic coverage evaluation
KW - primal-dual optimization
KW - Resource management
KW - Security
KW - Trajectory
KW - Unmanned aerial vehicle (UAV)
UR - http://www.scopus.com/inward/record.url?scp=85202766230&partnerID=8YFLogxK
U2 - 10.1109/TVT.2024.3450956
DO - 10.1109/TVT.2024.3450956
M3 - Article
AN - SCOPUS:85202766230
SN - 0018-9545
SP - 1
EP - 12
JO - IEEE Transactions on Vehicular Technology
JF - IEEE Transactions on Vehicular Technology
ER -