TY - JOUR
T1 - Differentiable Logic Policy for Interpretable Deep Reinforcement Learning
T2 - A Study From an Optimization Perspective
AU - Li, Xin
AU - Lei, Haojie
AU - Zhang, Li
AU - Wang, Mingzhong
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2023/10/1
Y1 - 2023/10/1
N2 - The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.
AB - The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.
KW - Deep reinforcement learning
KW - interpretable reinforcement learning
KW - machine learning
KW - policy optimization
UR - http://www.scopus.com/inward/record.url?scp=85162617581&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2023.3285634
DO - 10.1109/TPAMI.2023.3285634
M3 - Article
C2 - 37310843
AN - SCOPUS:85162617581
SN - 0162-8828
VL - 45
SP - 11654
EP - 11667
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 10
ER -