Off-Policy Differentiable Logic Reinforcement Learning

Li Zhang, Xin Li*, Mingzhong Wang, Andong Tian

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings
EditorsNuria Oliver, Fernando Pérez-Cruz, Stefan Kramer, Jesse Read, Jose A. Lozano
PublisherSpringer Science and Business Media Deutschland GmbH
Pages617-632
Number of pages16
ISBN (Print)9783030865191
DOIs
Publication statusPublished - 2021
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021 - Virtual, Online
Duration: 13 Sept 202117 Sept 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12976 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021
CityVirtual, Online
Period13/09/2117/09/21

Keywords

  • Deep reinforcement learning
  • Interpretable reinforcement learning
  • Neural-Symbolic AI

Fingerprint

Dive into the research topics of 'Off-Policy Differentiable Logic Reinforcement Learning'. Together they form a unique fingerprint.

Cite this