Dependency-Elimination MADRL: Scalable On-Board Resource Allocation for Feeder- and User-Link Integrated Satellite Communications

Qiaolin Ouyang; Neng Ye; Wonjae Shin; Xiaozheng Gao; Dusit Niyato; Kai Yang

doi:10.1109/TCOMM.2025.3529212

Dependency-Elimination MADRL: Scalable On-Board Resource Allocation for Feeder- and User-Link Integrated Satellite Communications

Qiaolin Ouyang, Neng Ye^*, Wonjae Shin, Xiaozheng Gao, Dusit Niyato, Kai Yang

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Integrating feeder- and user-links in multi-beam satellite communications significantly enhances system flexibility but requires effective resource allocation to fully realize its potential. Multi-agent deep reinforcement learning (MADRL) has emerged as a scalable solution for beam hopping, by allowing each agent to optimize the transmission parameters for one beam. However, integrating feeder- and user-links introduces complicated dependencies, including resource competition between feeder- and user-links and data-flow coupling between uplinks and downlinks, dramatically deteriorating agent cooperation. To approach the performance limit, this paper introduces a dependency-elimination MADRL framework incorporating model decomposition, link decoupling, and novel agent-level collaboration mechanisms to allocate beams, power, and bandwidth with reduced complexity. Specifically, to facilitate beam-level agent reuse for complexity reduction under the heterogeneity of feeder- and user-links, characterized by data-flow aggregation and division, we decouple bandwidth allocation from the learning model. The uplink-downlink dependencies in the bandwidth allocation is then resolved using a generalized water-filling strategy based on the performance upper bounds. Furthermore, we improve agent cooperation efficiency through state and reward decomposition and a novel non-cooperation penalty. Evaluations show that our method improves the system performance by up to 57.7% compared to sota MADRL methods while reducing training complexity by more than 50%.

源语言	英语
期刊	IEEE Transactions on Communications
DOI	https://doi.org/10.1109/TCOMM.2025.3529212
出版状态	已接受/待刊 - 2025

访问文件

10.1109/TCOMM.2025.3529212

其它文件与链接

链接到 Scopus 的出版物

引用此

Ouyang, Q., Ye, N., Shin, W., Gao, X., Niyato, D., & Yang, K. (已接受/印刷中). Dependency-Elimination MADRL: Scalable On-Board Resource Allocation for Feeder- and User-Link Integrated Satellite Communications. IEEE Transactions on Communications. https://doi.org/10.1109/TCOMM.2025.3529212

@article{72d74c47381440ac9598a88fd0f7fa2d,

title = "Dependency-Elimination MADRL: Scalable On-Board Resource Allocation for Feeder- and User-Link Integrated Satellite Communications",

abstract = "Integrating feeder- and user-links in multi-beam satellite communications significantly enhances system flexibility but requires effective resource allocation to fully realize its potential. Multi-agent deep reinforcement learning (MADRL) has emerged as a scalable solution for beam hopping, by allowing each agent to optimize the transmission parameters for one beam. However, integrating feeder- and user-links introduces complicated dependencies, including resource competition between feeder- and user-links and data-flow coupling between uplinks and downlinks, dramatically deteriorating agent cooperation. To approach the performance limit, this paper introduces a dependency-elimination MADRL framework incorporating model decomposition, link decoupling, and novel agent-level collaboration mechanisms to allocate beams, power, and bandwidth with reduced complexity. Specifically, to facilitate beam-level agent reuse for complexity reduction under the heterogeneity of feeder- and user-links, characterized by data-flow aggregation and division, we decouple bandwidth allocation from the learning model. The uplink-downlink dependencies in the bandwidth allocation is then resolved using a generalized water-filling strategy based on the performance upper bounds. Furthermore, we improve agent cooperation efficiency through state and reward decomposition and a novel non-cooperation penalty. Evaluations show that our method improves the system performance by up to 57.7% compared to sota MADRL methods while reducing training complexity by more than 50%.",

keywords = "Multi-beam satellite, deep reinforcement learning, feeder- and user-link integration, unified resource allocation",

author = "Qiaolin Ouyang and Neng Ye and Wonjae Shin and Xiaozheng Gao and Dusit Niyato and Kai Yang",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.",

year = "2025",

doi = "10.1109/TCOMM.2025.3529212",

language = "English",

journal = "IEEE Transactions on Communications",

issn = "1558-0857",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Dependency-Elimination MADRL

T2 - Scalable On-Board Resource Allocation for Feeder- and User-Link Integrated Satellite Communications

AU - Ouyang, Qiaolin

AU - Ye, Neng

AU - Shin, Wonjae

AU - Gao, Xiaozheng

AU - Niyato, Dusit

AU - Yang, Kai

PY - 2025

Y1 - 2025

N2 - Integrating feeder- and user-links in multi-beam satellite communications significantly enhances system flexibility but requires effective resource allocation to fully realize its potential. Multi-agent deep reinforcement learning (MADRL) has emerged as a scalable solution for beam hopping, by allowing each agent to optimize the transmission parameters for one beam. However, integrating feeder- and user-links introduces complicated dependencies, including resource competition between feeder- and user-links and data-flow coupling between uplinks and downlinks, dramatically deteriorating agent cooperation. To approach the performance limit, this paper introduces a dependency-elimination MADRL framework incorporating model decomposition, link decoupling, and novel agent-level collaboration mechanisms to allocate beams, power, and bandwidth with reduced complexity. Specifically, to facilitate beam-level agent reuse for complexity reduction under the heterogeneity of feeder- and user-links, characterized by data-flow aggregation and division, we decouple bandwidth allocation from the learning model. The uplink-downlink dependencies in the bandwidth allocation is then resolved using a generalized water-filling strategy based on the performance upper bounds. Furthermore, we improve agent cooperation efficiency through state and reward decomposition and a novel non-cooperation penalty. Evaluations show that our method improves the system performance by up to 57.7% compared to sota MADRL methods while reducing training complexity by more than 50%.

AB - Integrating feeder- and user-links in multi-beam satellite communications significantly enhances system flexibility but requires effective resource allocation to fully realize its potential. Multi-agent deep reinforcement learning (MADRL) has emerged as a scalable solution for beam hopping, by allowing each agent to optimize the transmission parameters for one beam. However, integrating feeder- and user-links introduces complicated dependencies, including resource competition between feeder- and user-links and data-flow coupling between uplinks and downlinks, dramatically deteriorating agent cooperation. To approach the performance limit, this paper introduces a dependency-elimination MADRL framework incorporating model decomposition, link decoupling, and novel agent-level collaboration mechanisms to allocate beams, power, and bandwidth with reduced complexity. Specifically, to facilitate beam-level agent reuse for complexity reduction under the heterogeneity of feeder- and user-links, characterized by data-flow aggregation and division, we decouple bandwidth allocation from the learning model. The uplink-downlink dependencies in the bandwidth allocation is then resolved using a generalized water-filling strategy based on the performance upper bounds. Furthermore, we improve agent cooperation efficiency through state and reward decomposition and a novel non-cooperation penalty. Evaluations show that our method improves the system performance by up to 57.7% compared to sota MADRL methods while reducing training complexity by more than 50%.

KW - Multi-beam satellite

KW - deep reinforcement learning

KW - feeder- and user-link integration

KW - unified resource allocation

UR - http://www.scopus.com/inward/record.url?scp=85215851557&partnerID=8YFLogxK

U2 - 10.1109/TCOMM.2025.3529212

DO - 10.1109/TCOMM.2025.3529212

M3 - Article

AN - SCOPUS:85215851557

SN - 1558-0857

JO - IEEE Transactions on Communications

JF - IEEE Transactions on Communications

ER -

Dependency-Elimination MADRL: Scalable On-Board Resource Allocation for Feeder- and User-Link Integrated Satellite Communications

摘要

访问文件

其它文件与链接

指纹

引用此