TY - JOUR
T1 - Large-Scale Multirobot Task Planning Using Efficient Hierarchical Reinforcement Learning
AU - Zhou, Xuan
AU - Shi, Xiang
AU - Zhang, Lele
AU - Chen, Chen
AU - Li, Hongbo
AU - Ma, Lin
AU - Deng, Fang
AU - Chen, Jie
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Multirobot task planning (MRTP) at scale in robotic mobile fulfillment systems (RMFS) remains a challenge due to the curse of dimensionality and complex dynamic properties. Aiming to solve these challenges, we construct an end-to-end scalable multirobot task planner capable of scaling to large-scale systems by learning hierarchical planning policies. In this planner, we design a centralized hierarchical temporal task planning framework to mitigate the curse of dimensionality while ensuring timely dynamic response. Following this framework, we propose a novel cycle-constrained asynchronous temporal graph to provide foundation for modeling the system dynamics. Based on the graph representation, we formulate the MRTP problem as a semi-Markov decision process (SMDP) that focuses solely on critical interaction points to improve computational and sampling efficiency. The policies in SMDP are parameterized via a hierarchical temporal attention network with temporal embedding layers to enhance spatio-temporal feature extraction. In addition, the decoder masks in this network naturally ensure that the generated actions strictly satisfy the required dynamic hard constraints. The above hierarchical policies are jointly optimized using an efficient hierarchical REINFORCE with rollout counterfactual baseline method. To further enhance generalization performance on unlearned instances while preventing catastrophic forgetting, we extend it with region expansion curricula. Experiments demonstrate that our planner outperforms state-of-the-art methods on different MRTP instances across simulated and real-world RMFS. It successfully scales to instances with up to 200 robots, 1000 retrieval racks on unlearned maps while maintaining performance advantages.
AB - Multirobot task planning (MRTP) at scale in robotic mobile fulfillment systems (RMFS) remains a challenge due to the curse of dimensionality and complex dynamic properties. Aiming to solve these challenges, we construct an end-to-end scalable multirobot task planner capable of scaling to large-scale systems by learning hierarchical planning policies. In this planner, we design a centralized hierarchical temporal task planning framework to mitigate the curse of dimensionality while ensuring timely dynamic response. Following this framework, we propose a novel cycle-constrained asynchronous temporal graph to provide foundation for modeling the system dynamics. Based on the graph representation, we formulate the MRTP problem as a semi-Markov decision process (SMDP) that focuses solely on critical interaction points to improve computational and sampling efficiency. The policies in SMDP are parameterized via a hierarchical temporal attention network with temporal embedding layers to enhance spatio-temporal feature extraction. In addition, the decoder masks in this network naturally ensure that the generated actions strictly satisfy the required dynamic hard constraints. The above hierarchical policies are jointly optimized using an efficient hierarchical REINFORCE with rollout counterfactual baseline method. To further enhance generalization performance on unlearned instances while preventing catastrophic forgetting, we extend it with region expansion curricula. Experiments demonstrate that our planner outperforms state-of-the-art methods on different MRTP instances across simulated and real-world RMFS. It successfully scales to instances with up to 200 robots, 1000 retrieval racks on unlearned maps while maintaining performance advantages.
KW - Hierarchical reinforcement learning (HRL)
KW - large-scale multirobot task planning
KW - warehousing systems
UR - https://www.scopus.com/pages/publications/105036533358
U2 - 10.1109/TRO.2026.3686265
DO - 10.1109/TRO.2026.3686265
M3 - Article
AN - SCOPUS:105036533358
SN - 1552-3098
VL - 42
SP - 2146
EP - 2165
JO - IEEE Transactions on Robotics
JF - IEEE Transactions on Robotics
ER -