TY - GEN
T1 - POCE
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
AU - Guan, Jiayi
AU - Shen, Li
AU - Zhou, Ao
AU - Li, Lusong
AU - Hu, Han
AU - He, Xiaodong
AU - Chen, Guang
AU - Jiang, Changjun
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy both cumulative and state- wise costs from offline datasets. This arrangement provides an effective approach for the widespread appli-cation of RL in high-risk scenarios where both cumulative and state-wise costs need to be considered simulta-neously. However, previously constrained offline RL algorithms are primarily designed to handle single-constraint problems related to cumulative cost, which faces challenges when addressing multi-constraint tasks that involve both cumulative and state-wise costs. In this work, we pro-pose a novel Primal policy Optimization with Conservative Estimation algorithm (POCE) to address the problem of multi-constraint offline RL. Concretely, we reframe the ob-jective of multi-constraint offline RL by introducing the con-cept of Maximum Markov Decision Processes (MMDP). Subsequently, we present a primal policy optimization al-gorithm to confront the multi-constraint problems, which improves the stability and convergence speed of model training. Furthermore, we propose a conditional Bell-man operator to estimate cumulative and state-wise Q-values, reducing the extrapolation error caused by out-of-distribution (OOD) actions. Finally, extensive experiments demonstrate that the POCE algorithm achieves competitive performance across multiple experimental tasks, particu-larly outperforming baseline algorithms in terms of safety. Our code is available at github. POCE.
AB - Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy both cumulative and state- wise costs from offline datasets. This arrangement provides an effective approach for the widespread appli-cation of RL in high-risk scenarios where both cumulative and state-wise costs need to be considered simulta-neously. However, previously constrained offline RL algorithms are primarily designed to handle single-constraint problems related to cumulative cost, which faces challenges when addressing multi-constraint tasks that involve both cumulative and state-wise costs. In this work, we pro-pose a novel Primal policy Optimization with Conservative Estimation algorithm (POCE) to address the problem of multi-constraint offline RL. Concretely, we reframe the ob-jective of multi-constraint offline RL by introducing the con-cept of Maximum Markov Decision Processes (MMDP). Subsequently, we present a primal policy optimization al-gorithm to confront the multi-constraint problems, which improves the stability and convergence speed of model training. Furthermore, we propose a conditional Bell-man operator to estimate cumulative and state-wise Q-values, reducing the extrapolation error caused by out-of-distribution (OOD) actions. Finally, extensive experiments demonstrate that the POCE algorithm achieves competitive performance across multiple experimental tasks, particu-larly outperforming baseline algorithms in terms of safety. Our code is available at github. POCE.
KW - Offline Reinforcement Learning; Constraint Reinforcement Learning;Mulit-constraint
UR - http://www.scopus.com/inward/record.url?scp=85196703602&partnerID=8YFLogxK
U2 - 10.1109/CVPR52733.2024.02479
DO - 10.1109/CVPR52733.2024.02479
M3 - Conference contribution
AN - SCOPUS:85196703602
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 26233
EP - 26243
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -