TY - JOUR
T1 - Learning Multi-Agent Reservoir Cooperative Operations Over Multi-Relational Directed Acyclic Graph
AU - He, Qiyong
AU - Li, Xiuxian
AU - Liang, Li
AU - Chen, Chen
AU - Deng, Fang
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Operating large multi-reservoir systems is critical for effective water allocation, hydropower generation, and economic development. However, conventional robust planning-based methods scale poorly beyond single-reservoir or cascaded systems due to computational intractability. Addressing inter-reservoir relation extraction and coordination under conflicting objectives, uncertain inflows, and complex network couplings therefore remains an open challenge. To tackle this problem, we introduce the multi-agent reservoir cooperative operation (MARCO) environment, which integrates multiple objectives, heterogeneous topology, and stochastic inflows in a unified framework. An algorithm is then designed to construct a multi-relational directed acyclic graph (MR-DAG) that encodes the underlying topology through coupled objectives and entity relations. Building on this representation, we propose the multi-agent relational directed acyclic graph transformer (MAR-DAGT), a reinforcement learning algorithm that performs typed message passing for efficient feature extraction and employs acyclic decision-making to exploit causal structure for improved credit assignment. Extensive experiments on MARCO show that MAR-DAGT consistently outperforms other MARL and optimization-based baselines in terms of objective satisfaction and robustness to inflow uncertainty. Note to Practitioners—This paper investigates the cooperative operations in multi-reservoir water allocation networks. Traditionally, this problem has been addressed through planning-based methods that offer robustness guarantees. However, these methods face significant limitations as the number of reservoir stakeholders increases, resulting in exponential growth in problem complexity. With the growing demand for high-temporal granularity scheduling and refined reservoir management strategies, the number of involved stakeholders continues to rise, further complicating the network topology. To overcome these challenges, we propose a learning-based operational method for multi-reservoir systems. The proposed approach effectively uncovers the underlying network topology and captures causal relationships among multiple objectives within the reservoir network under inflow uncertainties. Experimental evaluations conducted on synthetic reservoir cooperation scenarios demonstrate that the proposed method surpasses traditional planning-based approaches and existing learning-based solutions across several performance metrics, including civilian water supply, hydropower generation, and ecological conservation. Moreover, the sensitivity analysis conducted under out-of-distribution inflow noise empirically examines the stability of the proposed approach. Although the proposed method has not yet been validated in real-world reservoir systems, it shows promising potential to enhance the economic and ecological efficiency of large-scale multi-reservoir management practices. Future research will focus on extending the method to accommodate dynamic topologies in water allocation networks.
AB - Operating large multi-reservoir systems is critical for effective water allocation, hydropower generation, and economic development. However, conventional robust planning-based methods scale poorly beyond single-reservoir or cascaded systems due to computational intractability. Addressing inter-reservoir relation extraction and coordination under conflicting objectives, uncertain inflows, and complex network couplings therefore remains an open challenge. To tackle this problem, we introduce the multi-agent reservoir cooperative operation (MARCO) environment, which integrates multiple objectives, heterogeneous topology, and stochastic inflows in a unified framework. An algorithm is then designed to construct a multi-relational directed acyclic graph (MR-DAG) that encodes the underlying topology through coupled objectives and entity relations. Building on this representation, we propose the multi-agent relational directed acyclic graph transformer (MAR-DAGT), a reinforcement learning algorithm that performs typed message passing for efficient feature extraction and employs acyclic decision-making to exploit causal structure for improved credit assignment. Extensive experiments on MARCO show that MAR-DAGT consistently outperforms other MARL and optimization-based baselines in terms of objective satisfaction and robustness to inflow uncertainty. Note to Practitioners—This paper investigates the cooperative operations in multi-reservoir water allocation networks. Traditionally, this problem has been addressed through planning-based methods that offer robustness guarantees. However, these methods face significant limitations as the number of reservoir stakeholders increases, resulting in exponential growth in problem complexity. With the growing demand for high-temporal granularity scheduling and refined reservoir management strategies, the number of involved stakeholders continues to rise, further complicating the network topology. To overcome these challenges, we propose a learning-based operational method for multi-reservoir systems. The proposed approach effectively uncovers the underlying network topology and captures causal relationships among multiple objectives within the reservoir network under inflow uncertainties. Experimental evaluations conducted on synthetic reservoir cooperation scenarios demonstrate that the proposed method surpasses traditional planning-based approaches and existing learning-based solutions across several performance metrics, including civilian water supply, hydropower generation, and ecological conservation. Moreover, the sensitivity analysis conducted under out-of-distribution inflow noise empirically examines the stability of the proposed approach. Although the proposed method has not yet been validated in real-world reservoir systems, it shows promising potential to enhance the economic and ecological efficiency of large-scale multi-reservoir management practices. Future research will focus on extending the method to accommodate dynamic topologies in water allocation networks.
KW - Multi-reservoir operations
KW - directed acyclic graph
KW - heterogeneous graph
KW - multi-agent reinforcement learning
KW - uncertainty
UR - https://www.scopus.com/pages/publications/105024585746
U2 - 10.1109/TASE.2025.3641551
DO - 10.1109/TASE.2025.3641551
M3 - Article
AN - SCOPUS:105024585746
SN - 1545-5955
VL - 23
SP - 846
EP - 858
JO - IEEE Transactions on Automation Science and Engineering
JF - IEEE Transactions on Automation Science and Engineering
ER -