Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective

Xin Li*, Haojie Lei, Li Zhang, Mingzhong Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

3 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 3
  • Captures
    • Readers: 21
  • Mentions
    • News Mentions: 1
see details

摘要

The interpretability of policies remains an important challenge in Deep Reinforcement Learning (DRL). This paper explores interpretable DRL via representing policy by Differentiable Inductive Logic Programming (DILP) and provides a theoretical and empirical study of DILP-based policy learning from an optimization perspective. We first identified a fundamental fact that DILP-based policy learning should be solved as a constrained policy optimization problem. We then proposed to use Mirror Descent for policy optimization (MDPO) to deal with the constraints of DILP-based policies. We derived the closed-form regret bound of MDPO with function approximation, which is helpful to the design of DRL frameworks. Moreover, we studied the convexity of DILP-based policy to further verify the benefits gained from MDPO. Empirically, we experimented MDPO, its on-policy variant, and 3 mainstream policy learning methods, and the results verified our theoretical analysis.

源语言英语
页(从-至)11654-11667
页数14
期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
45
10
DOI
出版状态已出版 - 1 10月 2023

指纹

探究 'Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective' 的科研主题。它们共同构成独一无二的指纹。

引用此

Li, X., Lei, H., Zhang, L., & Wang, M. (2023). Differentiable Logic Policy for Interpretable Deep Reinforcement Learning: A Study From an Optimization Perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10), 11654-11667. https://doi.org/10.1109/TPAMI.2023.3285634