Off-Policy Differentiable Logic Reinforcement Learning

Li Zhang, Xin Li*, Mingzhong Wang, Andong Tian

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

8 引用 (Scopus)

摘要

In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

源语言英语
主期刊名Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings
编辑Nuria Oliver, Fernando Pérez-Cruz, Stefan Kramer, Jesse Read, Jose A. Lozano
出版商Springer Science and Business Media Deutschland GmbH
617-632
页数16
ISBN(印刷版)9783030865191
DOI
出版状态已出版 - 2021
活动European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021 - Virtual, Online
期限: 13 9月 202117 9月 2021

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12976 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021
Virtual, Online
时期13/09/2117/09/21

指纹

探究 'Off-Policy Differentiable Logic Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此