TY - JOUR
T1 - A Two-Layered Reinforcement Learning Framework for AoI-Aware Trajectory Planning and Scheduling Optimization in Multi-UAV Networks
AU - Fu, Kang
AU - Zhao, Qingjie
AU - Wang, Lei
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2025
Y1 - 2025
N2 - Unmanned aerial vehicles (UAVs) have emerged as an effective solution for data collection in Internet of Things (IoT) networks. To maintain data freshness, the age of information (AoI) has become a key performance metric, which is jointly influenced by UAV trajectory planning and sensor node (SN) scheduling. However, optimizing these two interdependent tasks simultaneously leads to high-dimensional decision spaces and unstable learning dynamics. To solve this problem, we propose a two-layered reinforcement learning framework for AoI-aware trajectory planning and scheduling optimization, named TL-RATS. In the upper layer, a reinforcement learning module is designed to learn long-term UAV trajectories by using the agent-by-agent policy optimization (A2PO) algorithm, enhanced by sequential updates and preceding-agent off-policy correction (PreOPC) to ensure sample-efficient and stable learning. In the lower layer, we formulate the scheduling problem as a time-constrained 0-1 knapsack optimization, where each item's weight represents data collection and transmission time, and its value corresponds to potential AoI reduction. A lightweight dynamic programming (DP) algorithm is used to allocate transmission opportunities under time constraints. Extensive experiments under diverse SN distributions demonstrate that TL-RATS significantly reduces AoI and outperforms representative baselines, including MAPPO, IPPO, MAT, greedy scheduling, and fully joint policy. These results highlight the benefits of the proposed layered design and task-specific coordination.
AB - Unmanned aerial vehicles (UAVs) have emerged as an effective solution for data collection in Internet of Things (IoT) networks. To maintain data freshness, the age of information (AoI) has become a key performance metric, which is jointly influenced by UAV trajectory planning and sensor node (SN) scheduling. However, optimizing these two interdependent tasks simultaneously leads to high-dimensional decision spaces and unstable learning dynamics. To solve this problem, we propose a two-layered reinforcement learning framework for AoI-aware trajectory planning and scheduling optimization, named TL-RATS. In the upper layer, a reinforcement learning module is designed to learn long-term UAV trajectories by using the agent-by-agent policy optimization (A2PO) algorithm, enhanced by sequential updates and preceding-agent off-policy correction (PreOPC) to ensure sample-efficient and stable learning. In the lower layer, we formulate the scheduling problem as a time-constrained 0-1 knapsack optimization, where each item's weight represents data collection and transmission time, and its value corresponds to potential AoI reduction. A lightweight dynamic programming (DP) algorithm is used to allocate transmission opportunities under time constraints. Extensive experiments under diverse SN distributions demonstrate that TL-RATS significantly reduces AoI and outperforms representative baselines, including MAPPO, IPPO, MAT, greedy scheduling, and fully joint policy. These results highlight the benefits of the proposed layered design and task-specific coordination.
KW - Age of information
KW - deep reinforcement learning
KW - planning and scheduling optimization
KW - unmanned aerial vehicles
UR - https://www.scopus.com/pages/publications/105023277163
U2 - 10.1109/JIOT.2025.3636204
DO - 10.1109/JIOT.2025.3636204
M3 - Article
AN - SCOPUS:105023277163
SN - 2327-4662
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
ER -