TY - JOUR
T1 - Dependency-Elimination MADRL
T2 - Scalable On-Board Resource Allocation for Feeder- and User-Link Integrated Satellite Communications
AU - Ouyang, Qiaolin
AU - Ye, Neng
AU - Shin, Wonjae
AU - Gao, Xiaozheng
AU - Niyato, Dusit
AU - Yang, Kai
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Integrating feeder- and user-links in multi-beam satellite communications significantly enhances system flexibility but requires effective resource allocation to fully realize its potential. Multi-agent deep reinforcement learning (MADRL) has emerged as a scalable solution for beam hopping, by allowing each agent to optimize the transmission parameters for one beam. However, integrating feeder- and user-links introduces complicated dependencies, including resource competition between feeder- and user-links and data-flow coupling between uplinks and downlinks, dramatically deteriorating agent cooperation. To approach the performance limit, this paper introduces a dependency-elimination MADRL framework incorporating model decomposition, link decoupling, and novel agent-level collaboration mechanisms to allocate beams, power, and bandwidth with reduced complexity. Specifically, to facilitate beam-level agent reuse for complexity reduction under the heterogeneity of feeder- and user-links, characterized by data-flow aggregation and division, we decouple bandwidth allocation from the learning model. The uplink-downlink dependencies in the bandwidth allocation is then resolved using a generalized water-filling strategy based on the performance upper bounds. Furthermore, we improve agent cooperation efficiency through state and reward decomposition and a novel non-cooperation penalty. Evaluations show that our method improves the system performance by up to 57.7% compared to sota MADRL methods while reducing training complexity by more than 50%.
AB - Integrating feeder- and user-links in multi-beam satellite communications significantly enhances system flexibility but requires effective resource allocation to fully realize its potential. Multi-agent deep reinforcement learning (MADRL) has emerged as a scalable solution for beam hopping, by allowing each agent to optimize the transmission parameters for one beam. However, integrating feeder- and user-links introduces complicated dependencies, including resource competition between feeder- and user-links and data-flow coupling between uplinks and downlinks, dramatically deteriorating agent cooperation. To approach the performance limit, this paper introduces a dependency-elimination MADRL framework incorporating model decomposition, link decoupling, and novel agent-level collaboration mechanisms to allocate beams, power, and bandwidth with reduced complexity. Specifically, to facilitate beam-level agent reuse for complexity reduction under the heterogeneity of feeder- and user-links, characterized by data-flow aggregation and division, we decouple bandwidth allocation from the learning model. The uplink-downlink dependencies in the bandwidth allocation is then resolved using a generalized water-filling strategy based on the performance upper bounds. Furthermore, we improve agent cooperation efficiency through state and reward decomposition and a novel non-cooperation penalty. Evaluations show that our method improves the system performance by up to 57.7% compared to sota MADRL methods while reducing training complexity by more than 50%.
KW - Multi-beam satellite
KW - deep reinforcement learning
KW - feeder- and user-link integration
KW - unified resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85215851557&partnerID=8YFLogxK
U2 - 10.1109/TCOMM.2025.3529212
DO - 10.1109/TCOMM.2025.3529212
M3 - Article
AN - SCOPUS:85215851557
SN - 1558-0857
JO - IEEE Transactions on Communications
JF - IEEE Transactions on Communications
ER -