Off-Policy Differentiable Logic Reinforcement Learning

Li Zhang; Xin Li; Mingzhong Wang; Andong Tian

doi:10.1007/978-3-030-86520-7_38

Off-Policy Differentiable Logic Reinforcement Learning

Li Zhang, Xin Li^*, Mingzhong Wang, Andong Tian

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

8 引用（Scopus）

摘要

In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

源语言	英语
主期刊名	Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings
编辑	Nuria Oliver, Fernando Pérez-Cruz, Stefan Kramer, Jesse Read, Jose A. Lozano
出版商	Springer Science and Business Media Deutschland GmbH
页	617-632
页数	16
ISBN（印刷版）	9783030865191
DOI	https://doi.org/10.1007/978-3-030-86520-7_38
出版状态	已出版 - 2021
活动	European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021 - Virtual, Online 期限: 13 9月 2021 → 17 9月 2021

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	12976 LNAI
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021
市	Virtual, Online
时期	13/09/21 → 17/09/21

访问文件

10.1007/978-3-030-86520-7_38

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, L., Li, X., Wang, M., & Tian, A. (2021). Off-Policy Differentiable Logic Reinforcement Learning. 在 N. Oliver, F. Pérez-Cruz, S. Kramer, J. Read, & J. A. Lozano (编辑), Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings (页码 617-632). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 12976 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-86520-7_38

Zhang, Li ; Li, Xin ; Wang, Mingzhong 等. / Off-Policy Differentiable Logic Reinforcement Learning. Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings. 编辑 / Nuria Oliver ; Fernando Pérez-Cruz ; Stefan Kramer ; Jesse Read ; Jose A. Lozano. Springer Science and Business Media Deutschland GmbH, 2021. 页码 617-632 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{2b2915d23dc648f58b2813dc9b0a5ade,

title = "Off-Policy Differentiable Logic Reinforcement Learning",

abstract = "In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.",

keywords = "Deep reinforcement learning, Interpretable reinforcement learning, Neural-Symbolic AI",

author = "Li Zhang and Xin Li and Mingzhong Wang and Andong Tian",

note = "Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG.; European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021 ; Conference date: 13-09-2021 Through 17-09-2021",

year = "2021",

doi = "10.1007/978-3-030-86520-7_38",

language = "English",

isbn = "9783030865191",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "617--632",

editor = "Nuria Oliver and Fernando P{\'e}rez-Cruz and Stefan Kramer and Jesse Read and Lozano, {Jose A.}",

booktitle = "Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings",

address = "Germany",

}

Zhang, L, Li, X, Wang, M & Tian, A 2021, Off-Policy Differentiable Logic Reinforcement Learning. 在 N Oliver, F Pérez-Cruz, S Kramer, J Read & JA Lozano (编辑), Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 12976 LNAI, Springer Science and Business Media Deutschland GmbH, 页码 617-632, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021, Virtual, Online, 13/09/21. https://doi.org/10.1007/978-3-030-86520-7_38

Off-Policy Differentiable Logic Reinforcement Learning. / Zhang, Li; Li, Xin; Wang, Mingzhong 等.
Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings. 编辑 / Nuria Oliver; Fernando Pérez-Cruz; Stefan Kramer; Jesse Read; Jose A. Lozano. Springer Science and Business Media Deutschland GmbH, 2021. 页码 617-632 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 12976 LNAI).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Off-Policy Differentiable Logic Reinforcement Learning

AU - Zhang, Li

AU - Li, Xin

AU - Wang, Mingzhong

AU - Tian, Andong

PY - 2021

Y1 - 2021

N2 - In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

AB - In this paper, we proposed an Off-Policy Differentiable Logic Reinforcement Learning (OPDLRL) framework to inherit the benefits of interpretability and generalization ability in Differentiable Inductive Logic Programming (DILP) and also resolves its weakness of execution efficiency, stability, and scalability. The key contributions include the use of approximate inference to significantly reduce the number of logic rules in the deduction process, an off-policy training method to enable approximate inference, and a distributed and hierarchical training framework. Extensive experiments, specifically playing real-time video games in Rabbids against human players, show that OPDLRL has better or similar performance as other DILP-based methods but far more practical in terms of sample efficiency and execution efficiency, making it applicable to complex and (near) real-time domains.

KW - Deep reinforcement learning

KW - Interpretable reinforcement learning

KW - Neural-Symbolic AI

UR - http://www.scopus.com/inward/record.url?scp=85115693688&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-86520-7_38

DO - 10.1007/978-3-030-86520-7_38

M3 - Conference contribution

AN - SCOPUS:85115693688

SN - 9783030865191

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 617

EP - 632

BT - Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings

A2 - Oliver, Nuria

A2 - Pérez-Cruz, Fernando

A2 - Kramer, Stefan

A2 - Read, Jesse

A2 - Lozano, Jose A.

PB - Springer Science and Business Media Deutschland GmbH

T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021

Y2 - 13 September 2021 through 17 September 2021

ER -

Zhang L, Li X, Wang M, Tian A. Off-Policy Differentiable Logic Reinforcement Learning. 在 Oliver N, Pérez-Cruz F, Kramer S, Read J, Lozano JA, 编辑, Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Proceedings. Springer Science and Business Media Deutschland GmbH. 2021. 页码 617-632. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-86520-7_38

Off-Policy Differentiable Logic Reinforcement Learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此