Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li*, Haojie Lei, Li Zhang, Mingzhong Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

Original languageEnglish
Pages (from-to)11654-11667
Number of pages14
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume45
Issue number10
DOIs
Publication statusPublished - 1 Oct 2023

Keywords

  • Deep reinforcement learning
  • interpretable reinforcement learning
  • machine learning
  • policy optimization

Fingerprint

Dive into the research topics of 'Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective'. Together they form a unique fingerprint.

Cite this