TY - JOUR
T1 - Research on LSTM-PPO Obstacle Avoidance Algorithm and Training Environment for Unmanned Surface Vehicles
AU - Luo, Wangbin
AU - Wang, Xiang
AU - Han, Fang
AU - Zhou, Zhiguo
AU - Cai, Junyu
AU - Zeng, Lin
AU - Chen, Hong
AU - Chen, Jiawei
AU - Zhou, Xuehua
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/3
Y1 - 2025/3
N2 - The current unmanned surface vehicle (USV) intelligent obstacle avoidance algorithm based on deep reinforcement learning usually adopts the mass point model to train in an ideal environment. However, in actual navigation, due to the influence of the ship model and the water surface environment, the training set is triggered. The reward function does not match the actual situation, resulting in a poor obstacle avoidance effect. In response to the above problems, this paper proposes a long and short memory network-proximal strategy optimization (LSTM-PPO) intelligent obstacle avoidance algorithm for non-particle models in non-ideal environments, and designs a corresponding deep reinforcement learning training environment. We integrate the motion characteristics of the unmanned boat and the influencing factors of the surface environment, based on the curiosity-driven set reward function, to improve its autonomous obstacle avoidance ability, combined with the LSTM network to identify and save obstacle information to improve the adaptability to the unknown environment; virtual simulation is performed in Unity. The engine builds a USV physical model and a refined water deep reinforcement learning training environment including a variety of obstacle models. The experimental results demonstrate that the LSTM-PPO algorithm exhibits an effective and rational obstacle avoidance effect, with a success rate of 86.7%, an average path length of 198.52 m, and a convergence time of 1.5 h. A comparison with the performance of three other deep reinforcement learning algorithms reveals that the LSTM-PPO algorithm exhibits a 21.5% reduction in average convergence time, an 18.5% reduction in average path length, and an approximately 20% enhancement in the success rate of obstacle avoidance in complex environments. These results indicate that the LSTM-PPO algorithm can effectively enhance the search efficiency and optimize the path planning in obstacle avoidance for unmanned boats, rendering it more rational.
AB - The current unmanned surface vehicle (USV) intelligent obstacle avoidance algorithm based on deep reinforcement learning usually adopts the mass point model to train in an ideal environment. However, in actual navigation, due to the influence of the ship model and the water surface environment, the training set is triggered. The reward function does not match the actual situation, resulting in a poor obstacle avoidance effect. In response to the above problems, this paper proposes a long and short memory network-proximal strategy optimization (LSTM-PPO) intelligent obstacle avoidance algorithm for non-particle models in non-ideal environments, and designs a corresponding deep reinforcement learning training environment. We integrate the motion characteristics of the unmanned boat and the influencing factors of the surface environment, based on the curiosity-driven set reward function, to improve its autonomous obstacle avoidance ability, combined with the LSTM network to identify and save obstacle information to improve the adaptability to the unknown environment; virtual simulation is performed in Unity. The engine builds a USV physical model and a refined water deep reinforcement learning training environment including a variety of obstacle models. The experimental results demonstrate that the LSTM-PPO algorithm exhibits an effective and rational obstacle avoidance effect, with a success rate of 86.7%, an average path length of 198.52 m, and a convergence time of 1.5 h. A comparison with the performance of three other deep reinforcement learning algorithms reveals that the LSTM-PPO algorithm exhibits a 21.5% reduction in average convergence time, an 18.5% reduction in average path length, and an approximately 20% enhancement in the success rate of obstacle avoidance in complex environments. These results indicate that the LSTM-PPO algorithm can effectively enhance the search efficiency and optimize the path planning in obstacle avoidance for unmanned boats, rendering it more rational.
KW - USV
KW - deep reinforcement learning
KW - obstacle avoidance
KW - proximal policy optimization
KW - reward function
UR - https://www.scopus.com/pages/publications/105001152399
U2 - 10.3390/jmse13030479
DO - 10.3390/jmse13030479
M3 - Article
AN - SCOPUS:105001152399
SN - 2077-1312
VL - 13
JO - Journal of Marine Science and Engineering
JF - Journal of Marine Science and Engineering
IS - 3
M1 - 479
ER -