Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li; Haojie Lei; Li Zhang; Mingzhong Wang

doi:10.1109/TPAMI.2023.3285634

Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li^*, Haojie Lei, Li Zhang, Mingzhong Wang

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

源语言	英语
页（从-至）	11654-11667
页数	14
期刊	IEEE Transactions on Pattern Analysis and Machine Intelligence
卷	45
期	10
DOI	https://doi.org/10.1109/TPAMI.2023.3285634
出版状态	已出版 - 1 10月 2023

访问文件

10.1109/TPAMI.2023.3285634

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, X., Lei, H., Zhang, L., & Wang, M. (2023). Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10), 11654-11667. https://doi.org/10.1109/TPAMI.2023.3285634

@article{614b0fb415bd496b94eb9fae35951243,

title = "Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective",

abstract = "The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.",

keywords = "Deep reinforcement learning, interpretable reinforcement learning, machine learning, policy optimization",

author = "Xin Li and Haojie Lei and Li Zhang and Mingzhong Wang",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2023",

month = oct,

day = "1",

doi = "10.1109/TPAMI.2023.3285634",

language = "English",

volume = "45",

pages = "11654--11667",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "10",

}

TY - JOUR

T1 - Differentiable Logic Policy for Interpretable Deep Reinforcement Learning

T2 - A Study From an Optimization Perspective

AU - Li, Xin

AU - Lei, Haojie

AU - Zhang, Li

AU - Wang, Mingzhong

PY - 2023/10/1

Y1 - 2023/10/1

N2 - The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

AB - The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

KW - Deep reinforcement learning

KW - interpretable reinforcement learning

KW - machine learning

KW - policy optimization

UR - http://www.scopus.com/inward/record.url?scp=85162617581&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2023.3285634

DO - 10.1109/TPAMI.2023.3285634

M3 - Article

C2 - 37310843

AN - SCOPUS:85162617581

SN - 0162-8828

VL - 45

SP - 11654

EP - 11667

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 10

ER -

Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

摘要

访问文件

其它文件与链接

指纹

引用此