Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces

Yahao Xu, Yiran Wei*, Keyang Jiang, Li Chen, Di Wang, Hongbin Deng

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

19 Citations (Scopus)

Abstract

Most existing Deep Reinforcement Learning (DRL) algorithms solely apply to discrete action or continuous action spaces. However, the agent often has both continuous and discrete action space, named hybrid action space. This paper proposes an action-decoupled algorithm for hybrid action space. Specifically, the hybrid action is decoupled, and then the original agent in the hybrid action space is abstracted into two agents. Each agent contains only discrete or continuous action space. The discrete and continuous actions are independent of each other to be executed simultaneously. We use the Soft Actor-Critic (SAC) algorithm as the optimization method and name our proposed algorithm Action Decoupled SAC (AD-SAC). We handle multi-agent problems using a framework of Centralized Training Decentralized Execution (CTDE) and then reduce the concatenation of partial agent observations to avoid the interference of redundant observations. We design a hybrid action space environment for Unmanned Aerial Vehicles (UAVs) path planning and gimbal scanning using AirSim. The results show that our algorithm has better convergence and robustness than the discretization, relaxation, and the Parametrized Deep Q-Networks Learning (P-DQN) algorithms. Finally, we carried out a Hardware in the Loop (HITL) simulation experiment based on Pixhawk to verify the feasibility of our algorithm.

Original languageEnglish
Pages (from-to)141-151
Number of pages11
JournalNeurocomputing
Volume537
DOIs
Publication statusPublished - 7 Jun 2023

Keywords

  • Hybrid action space
  • Reinforcement learning
  • SAC
  • Visual perception

Fingerprint

Dive into the research topics of 'Action decoupled SAC reinforcement learning with discrete-continuous hybrid action spaces'. Together they form a unique fingerprint.

Cite this