Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles

Jiabao Yu, Jiawei Chen, Ying Chen, Zhiguo Zhou*, Junwei Duan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Although broad reinforcement learning (BRL) provides a more intelligent autonomous decision-making method for the collision avoidance problem of unmanned surface vehicles (USVs), the algorithm still has the problem of over-estimation and has difficulty converging quickly due to the sparse reward problem in a large area of sea. To overcome the dilemma, we propose a double broad reinforcement learning based on hindsight experience replay (DBRL-HER) for the collision avoidance system of USVs to improve the efficiency and accuracy of decision-making. The algorithm decouples the two steps of target action selection and target Q value calculation to form the double broad reinforcement learning method and then adopts hindsight experience replay to allow the agent to learn from the experience of failure in order to greatly improve the sample utilization efficiency. Through training in a grid environment, the collision avoidance success rate of the proposed algorithm was found to be 31.9 percentage points higher than that in the deep Q network (DQN) and 24.4 percentage points higher than that in BRL. A Unity 3D simulation platform with high fidelity was also designed to simulate the movement of USVs. An experiment on the platform fully verified the effectiveness of the proposed algorithm.

Original languageEnglish
Article number2026
JournalJournal of Marine Science and Engineering
Volume10
Issue number12
DOIs
Publication statusPublished - Dec 2022

Keywords

  • broad reinforcement learning
  • collision avoidance
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles'. Together they form a unique fingerprint.

Cite this