Abstract
Multi-robot task planning (MRTP) at scale in robotic mobile fulfillment systems (RMFS) remains a challenge due to the curse of dimensionality and complex dynamic properties. Aiming to solve these challenges, we construct an end-to-end scalable multi-robot task planner capable of scaling to large-scale systems by learning hierarchical planning policies. In this planner, we design a centralized hierarchical temporal task planning framework to mitigate the curse of dimensionality while ensuring timely dynamic response. Following this framework, we propose a novel cycle-constrained asynchronous temporal graph (CycATG) to provide foundation for modeling the system dynamics. Based on the graph representation, we formulate the MRTP problem as a semi-Markov decision process (SMDP) that focuses solely on critical interaction points to improve computational and sampling efficiency. The policies in SMDP are parameterized via a hierarchical temporal attention network with temporal embedding layers to enhance spatio-temporal feature extraction. Additionally, the decoder masks in this network naturally ensure that the generated actions strictly satisfy the required dynamic hard constraints. The above hierarchical policies are jointly optimized using an efficient hierarchical REINFORCE with rollout counterfactual baseline method. To further enhance generalization performance on unlearned instances while preventing catastrophic forgetting, we extend it with region expansion curricula. Experiments demonstrate that our planner outperforms state-of-the-art methods on different MRTP instances across simulated and real-world RMFS. It successfully scales to instances with up to 200 robots, 1000 retrieval racks on unlearned maps while maintaining performance advantages.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Robotics |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Large-scale multi-robot task planning
- hierarchical reinforcement learning
- warehousing systems
Fingerprint
Dive into the research topics of 'Large-Scale Multi-Robot Task Planning using Efficient Hierarchical Reinforcement Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver