Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li; Haojie Lei; Li Zhang; Mingzhong Wang

doi:10.1109/TPAMI.2023.3285634

Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li^*, Haojie Lei, Li Zhang, Mingzhong Wang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

Original language	English
Pages (from-to)	11654-11667
Number of pages	14
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	45
Issue number	10
DOIs	https://doi.org/10.1109/TPAMI.2023.3285634
Publication status	Published - 1 Oct 2023

Keywords

Deep reinforcement learning
interpretable reinforcement learning
machine learning
policy optimization

Access to Document

10.1109/TPAMI.2023.3285634

Cite this

Li, X., Lei, H., Zhang, L., & Wang, M. (2023). Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10), 11654-11667. https://doi.org/10.1109/TPAMI.2023.3285634

@article{614b0fb415bd496b94eb9fae35951243,

title = "Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective",

abstract = "The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.",

keywords = "Deep reinforcement learning, interpretable reinforcement learning, machine learning, policy optimization",

author = "Xin Li and Haojie Lei and Li Zhang and Mingzhong Wang",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2023",

month = oct,

day = "1",

doi = "10.1109/TPAMI.2023.3285634",

language = "English",

volume = "45",

pages = "11654--11667",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "10",

}

TY - JOUR

T1 - Differentiable Logic Policy for Interpretable Deep Reinforcement Learning

T2 - A Study From an Optimization Perspective

AU - Li, Xin

AU - Lei, Haojie

AU - Zhang, Li

AU - Wang, Mingzhong

PY - 2023/10/1

Y1 - 2023/10/1

N2 - The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

AB - The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

KW - Deep reinforcement learning

KW - interpretable reinforcement learning

KW - machine learning

KW - policy optimization

UR - http://www.scopus.com/inward/record.url?scp=85162617581&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2023.3285634

DO - 10.1109/TPAMI.2023.3285634

M3 - Article

C2 - 37310843

AN - SCOPUS:85162617581

SN - 0162-8828

VL - 45

SP - 11654

EP - 11667

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 10

ER -

Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this