TY - JOUR
T1 - Differential High Order Control Barrier Function-Based Safe Reinforcement Learning
AU - Kong, Xiangyu
AU - Xia, Yuanqing
AU - Sun, Zhongqi
AU - Zhai, Di Hua
AU - Deng, Yunshan
AU - Zhang, Sihua
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2025
Y1 - 2025
N2 - Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier function (CBF) or high order control barrier function (HOCBF) for the RL policy. A quadratic programming (QP) is then formulated and solved to modify the RL policy, enabling safe exploration. However, directly integrating the safety filter with RL presents two challenges: (1) the conservativeness of safe policy, and (2) potential infeasibility of the QP under bounded input constraints. These issues limit the performance of safe RL. In this letter, we introduce a differential HOCBF constraint by incorporating neural network-based penalty functions into HOCBF. Furthermore, we propose a differential HOCBF-based safe RL framework in which the penalty functions and RL policy are trained concurrently. To address conservativeness, we train penalty functions to maximize long-term rewards while preventing abrupt changes in safe action, thereby achieving ideal performance. To ensure the feasibility of the formulated QP under bounded input constraints, we calculate a set for penalty functions and prove that the feasibility is guaranteed if the learned penalty functions remain within the set. Finally, we verify the effectiveness of the proposed framework on the wheeled mobile robot navigation and obstacle avoidance task.
AB - Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier function (CBF) or high order control barrier function (HOCBF) for the RL policy. A quadratic programming (QP) is then formulated and solved to modify the RL policy, enabling safe exploration. However, directly integrating the safety filter with RL presents two challenges: (1) the conservativeness of safe policy, and (2) potential infeasibility of the QP under bounded input constraints. These issues limit the performance of safe RL. In this letter, we introduce a differential HOCBF constraint by incorporating neural network-based penalty functions into HOCBF. Furthermore, we propose a differential HOCBF-based safe RL framework in which the penalty functions and RL policy are trained concurrently. To address conservativeness, we train penalty functions to maximize long-term rewards while preventing abrupt changes in safe action, thereby achieving ideal performance. To ensure the feasibility of the formulated QP under bounded input constraints, we calculate a set for penalty functions and prove that the feasibility is guaranteed if the learned penalty functions remain within the set. Finally, we verify the effectiveness of the proposed framework on the wheeled mobile robot navigation and obstacle avoidance task.
KW - Reinforcement learning (RL)
KW - collision avoidance
KW - robot safety
UR - http://www.scopus.com/inward/record.url?scp=105007366011&partnerID=8YFLogxK
U2 - 10.1109/LRA.2025.3575310
DO - 10.1109/LRA.2025.3575310
M3 - Article
AN - SCOPUS:105007366011
SN - 2377-3766
VL - 10
SP - 7524
EP - 7531
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 7
ER -