Research on LSTM-PPO Obstacle Avoidance Algorithm and Training Environment for Unmanned Surface Vehicles

  • Wangbin Luo
  • , Xiang Wang
  • , Fang Han
  • , Zhiguo Zhou*
  • , Junyu Cai
  • , Lin Zeng
  • , Hong Chen
  • , Jiawei Chen
  • , Xuehua Zhou
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

The current unmanned surface vehicle (USV) intelligent obstacle avoidance algorithm based on deep reinforcement learning usually adopts the mass point model to train in an ideal environment. However, in actual navigation, due to the influence of the ship model and the water surface environment, the training set is triggered. The reward function does not match the actual situation, resulting in a poor obstacle avoidance effect. In response to the above problems, this paper proposes a long and short memory network-proximal strategy optimization (LSTM-PPO) intelligent obstacle avoidance algorithm for non-particle models in non-ideal environments, and designs a corresponding deep reinforcement learning training environment. We integrate the motion characteristics of the unmanned boat and the influencing factors of the surface environment, based on the curiosity-driven set reward function, to improve its autonomous obstacle avoidance ability, combined with the LSTM network to identify and save obstacle information to improve the adaptability to the unknown environment; virtual simulation is performed in Unity. The engine builds a USV physical model and a refined water deep reinforcement learning training environment including a variety of obstacle models. The experimental results demonstrate that the LSTM-PPO algorithm exhibits an effective and rational obstacle avoidance effect, with a success rate of 86.7%, an average path length of 198.52 m, and a convergence time of 1.5 h. A comparison with the performance of three other deep reinforcement learning algorithms reveals that the LSTM-PPO algorithm exhibits a 21.5% reduction in average convergence time, an 18.5% reduction in average path length, and an approximately 20% enhancement in the success rate of obstacle avoidance in complex environments. These results indicate that the LSTM-PPO algorithm can effectively enhance the search efficiency and optimize the path planning in obstacle avoidance for unmanned boats, rendering it more rational.

Original languageEnglish
Article number479
JournalJournal of Marine Science and Engineering
Volume13
Issue number3
DOIs
Publication statusPublished - Mar 2025

Keywords

  • USV
  • deep reinforcement learning
  • obstacle avoidance
  • proximal policy optimization
  • reward function

Fingerprint

Dive into the research topics of 'Research on LSTM-PPO Obstacle Avoidance Algorithm and Training Environment for Unmanned Surface Vehicles'. Together they form a unique fingerprint.

Cite this