TY - JOUR
T1 - Multi-agent policy learning-based path planning for autonomous mobile robots
AU - Zhang, Lixiang
AU - Cai, Ze
AU - Yan, Yan
AU - Yang, Chen
AU - Hu, Yaoguang
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/3
Y1 - 2024/3
N2 - The study addresses path planning problems for autonomous mobile robots (AMRs), considering their kinematics, where performance and responsiveness are often incompatible. This study proposes a multi-agent policy learning-based method to tackle this challenge in dynamic environments. The proposed method features a centralized learning and decentralized execution-based path planning framework designed to meet performance and responsiveness requirements. The problem is modeled as a partial observation Markov Decision Process for policy learning while considering the kinematics using conventional neural networks. Then, an improved proximal policy optimization algorithm is developed with highlight experience replay that corrects failed experiences to speed up the learning processes. The experimental results show that the proposed method outperforms the baselines in both static and dynamic environments. The proposed method shortens the movement distance and time in static environments by about 29.1% and 5.7%, as well as in dynamic environments by about 21.1% and 20.4%, respectively. The runtime is maintained in milliseconds across various environments, taking only 0.07 s. Overall, the proposed method is valid and efficient in ensuring the performance and responsiveness of AMRs when dealing with complex and dynamic path planning problems.
AB - The study addresses path planning problems for autonomous mobile robots (AMRs), considering their kinematics, where performance and responsiveness are often incompatible. This study proposes a multi-agent policy learning-based method to tackle this challenge in dynamic environments. The proposed method features a centralized learning and decentralized execution-based path planning framework designed to meet performance and responsiveness requirements. The problem is modeled as a partial observation Markov Decision Process for policy learning while considering the kinematics using conventional neural networks. Then, an improved proximal policy optimization algorithm is developed with highlight experience replay that corrects failed experiences to speed up the learning processes. The experimental results show that the proposed method outperforms the baselines in both static and dynamic environments. The proposed method shortens the movement distance and time in static environments by about 29.1% and 5.7%, as well as in dynamic environments by about 21.1% and 20.4%, respectively. The runtime is maintained in milliseconds across various environments, taking only 0.07 s. Overall, the proposed method is valid and efficient in ensuring the performance and responsiveness of AMRs when dealing with complex and dynamic path planning problems.
KW - Deep reinforcement learning
KW - Dynamics
KW - Multi-agent systems
KW - Path planning
KW - Proximal policy optimization
UR - http://www.scopus.com/inward/record.url?scp=85178158468&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2023.107631
DO - 10.1016/j.engappai.2023.107631
M3 - Article
AN - SCOPUS:85178158468
SN - 0952-1976
VL - 129
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 107631
ER -