POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning

Jiayi Guan, Li Shen, Ao Zhou, Lusong Li, Han Hu, Xiaodong He, Guang Chen*, Changjun Jiang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy both cumulative and state- wise costs from offline datasets. This arrangement provides an effective approach for the widespread appli-cation of RL in high-risk scenarios where both cumulative and state-wise costs need to be considered simulta-neously. However, previously constrained offline RL algorithms are primarily designed to handle single-constraint problems related to cumulative cost, which faces challenges when addressing multi-constraint tasks that involve both cumulative and state-wise costs. In this work, we pro-pose a novel Primal policy Optimization with Conservative Estimation algorithm (POCE) to address the problem of multi-constraint offline RL. Concretely, we reframe the ob-jective of multi-constraint offline RL by introducing the con-cept of Maximum Markov Decision Processes (MMDP). Subsequently, we present a primal policy optimization al-gorithm to confront the multi-constraint problems, which improves the stability and convergence speed of model training. Furthermore, we propose a conditional Bell-man operator to estimate cumulative and state-wise Q-values, reducing the extrapolation error caused by out-of-distribution (OOD) actions. Finally, extensive experiments demonstrate that the POCE algorithm achieves competitive performance across multiple experimental tasks, particu-larly outperforming baseline algorithms in terms of safety. Our code is available at github. POCE.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE Computer Society
Pages26233-26243
Number of pages11
ISBN (Electronic)9798350353006
DOIs
Publication statusPublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24

Keywords

  • Offline Reinforcement Learning; Constraint Reinforcement Learning;Mulit-constraint

Fingerprint

Dive into the research topics of 'POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning'. Together they form a unique fingerprint.

Cite this