Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li*, Haojie Lei, Li Zhang, Mingzhong Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

3 引用 (Scopus)

摘要

The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

源语言英语
页(从-至)11654-11667
页数14
期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
45
10
DOI
出版状态已出版 - 1 10月 2023

指纹

探究 'Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective' 的科研主题。它们共同构成独一无二的指纹。

引用此