MDPo: Offline Reinforcement Learning Based on Mixture Density Policy Network

Chen Liu, Yizhuo Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Offline reinforcement learning aims to empower agents to derive effective strategies from a pre-existing dataset for decision-making tasks. This learning paradigm has a broad application prospect in areas with high safety constraints, such as healthcare and robotic control. However, existing offline reinforcement learning algorithms often overlook the impact of the dataset's inherent multimodal distribution on policy optimization during the training phase, leading to compromised model performance. To tackle this challenge, we introduce a novel offline reinforcement learning algorithm based on mixture density policy network, MDPo ( Mixture Density Policy). The MDPo algorithm initially employs an expected regression loss function to train the value function. Subsequently, it constructs the policy using a mixture density network and trains the policy through a distributional constraint, ultimately learning a high-quality policy model under the combined influence of the reward signal and policy constraints. Predominantly leveraging mixture density networks, MDPo models the policy as a multimodal distribution, enhancing the policy's representational capacity to better fit the multimodal distribution of actions in the dataset, thereby increasing training process stability and improving model performance. Experiments conducted on the Antmaze task of the D4RL benchmark demonstrate that MDPo significantly outperforms existing state-of-the-art methods, also demonstrating the enhancement of training stability.

Original languageEnglish
Title of host publicationProceedings of 2024 International Conference on Generative Artificial Intelligence and Information Security, GAIIS 2024
PublisherAssociation for Computing Machinery
Pages132-137
Number of pages6
ISBN (Electronic)9798400709562
DOIs
Publication statusPublished - 10 May 2024
Event2024 International Conference on Generative Artificial Intelligence and Information Security, GAIIS 2024 - Kuala Lumpur, Malaysia
Duration: 10 May 202412 May 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2024 International Conference on Generative Artificial Intelligence and Information Security, GAIIS 2024
Country/TerritoryMalaysia
CityKuala Lumpur
Period10/05/2412/05/24

Keywords

  • Multimodal distribution
  • Offline reinforcement learning
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'MDPo: Offline Reinforcement Learning Based on Mixture Density Policy Network'. Together they form a unique fingerprint.

Cite this