Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Zhuo Jiang; Daiying Tian; Qingkai Yang; Zhihong Peng

doi:10.1109/CAC53003.2021.9727314

Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Zhuo Jiang, Daiying Tian, Qingkai Yang, Zhihong Peng^*

^*Corresponding author for this work

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

This paper proposes a self-attention based temporal intrinsic reward model for reinforcement learning (RL), to synthesize the control policy for the agent constrained by the sparse reward in partially observable environments. This approach can solve the problem of temporal credit assignment to some extent and deal with the low efficiency of exploration. We first introduce a sequence-based self-attention mechanism to generate the temporary features, which can effectively capture the temporal property of the task for the agent. During the training process, the temporary features are employed in each sampled episode to elaborate the intrinsic rewards, which is combined with the extrinsic reward to help the agent learn a feasible policy. Then we use the meta-gradient to update this intrinsic reward model in order that the agent can achieve better performance. Experiments are given to demonstrate the superiority of the proposed method.

Original language	English
Title of host publication	Proceeding - 2021 China Automation Congress, CAC 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	2022-2026
Number of pages	5
ISBN (Electronic)	9781665426473
DOIs	https://doi.org/10.1109/CAC53003.2021.9727314
Publication status	Published - 2021
Event	2021 China Automation Congress, CAC 2021 - Beijing, China Duration: 22 Oct 2021 → 24 Oct 2021

Publication series

Name	Proceeding - 2021 China Automation Congress, CAC 2021

Conference

Conference	2021 China Automation Congress, CAC 2021
Country/Territory	China
City	Beijing
Period	22/10/21 → 24/10/21

Keywords

intrinsic motivation
reinforcement learning
self-attention
sparse reward

Access to Document

10.1109/CAC53003.2021.9727314

Cite this

Jiang, Z., Tian, D., Yang, Q., & Peng, Z. (2021). Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning. In Proceeding - 2021 China Automation Congress, CAC 2021 (pp. 2022-2026). (Proceeding - 2021 China Automation Congress, CAC 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CAC53003.2021.9727314

@inproceedings{7a6d6c7be06b401db7074772adce9d09,

title = "Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning",

abstract = "This paper proposes a self-attention based temporal intrinsic reward model for reinforcement learning (RL), to synthesize the control policy for the agent constrained by the sparse reward in partially observable environments. This approach can solve the problem of temporal credit assignment to some extent and deal with the low efficiency of exploration. We first introduce a sequence-based self-attention mechanism to generate the temporary features, which can effectively capture the temporal property of the task for the agent. During the training process, the temporary features are employed in each sampled episode to elaborate the intrinsic rewards, which is combined with the extrinsic reward to help the agent learn a feasible policy. Then we use the meta-gradient to update this intrinsic reward model in order that the agent can achieve better performance. Experiments are given to demonstrate the superiority of the proposed method.",

keywords = "intrinsic motivation, reinforcement learning, self-attention, sparse reward",

author = "Zhuo Jiang and Daiying Tian and Qingkai Yang and Zhihong Peng",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE; 2021 China Automation Congress, CAC 2021 ; Conference date: 22-10-2021 Through 24-10-2021",

year = "2021",

doi = "10.1109/CAC53003.2021.9727314",

language = "English",

series = "Proceeding - 2021 China Automation Congress, CAC 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2022--2026",

booktitle = "Proceeding - 2021 China Automation Congress, CAC 2021",

address = "United States",

}

Jiang, Z, Tian, D, Yang, Q & Peng, Z 2021, Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning. in Proceeding - 2021 China Automation Congress, CAC 2021. Proceeding - 2021 China Automation Congress, CAC 2021, Institute of Electrical and Electronics Engineers Inc., pp. 2022-2026, 2021 China Automation Congress, CAC 2021, Beijing, China, 22/10/21. https://doi.org/10.1109/CAC53003.2021.9727314

Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning. / Jiang, Zhuo; Tian, Daiying; Yang, Qingkai et al.
Proceeding - 2021 China Automation Congress, CAC 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 2022-2026 (Proceeding - 2021 China Automation Congress, CAC 2021).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

AU - Jiang, Zhuo

AU - Tian, Daiying

AU - Yang, Qingkai

AU - Peng, Zhihong

PY - 2021

Y1 - 2021

N2 - This paper proposes a self-attention based temporal intrinsic reward model for reinforcement learning (RL), to synthesize the control policy for the agent constrained by the sparse reward in partially observable environments. This approach can solve the problem of temporal credit assignment to some extent and deal with the low efficiency of exploration. We first introduce a sequence-based self-attention mechanism to generate the temporary features, which can effectively capture the temporal property of the task for the agent. During the training process, the temporary features are employed in each sampled episode to elaborate the intrinsic rewards, which is combined with the extrinsic reward to help the agent learn a feasible policy. Then we use the meta-gradient to update this intrinsic reward model in order that the agent can achieve better performance. Experiments are given to demonstrate the superiority of the proposed method.

AB - This paper proposes a self-attention based temporal intrinsic reward model for reinforcement learning (RL), to synthesize the control policy for the agent constrained by the sparse reward in partially observable environments. This approach can solve the problem of temporal credit assignment to some extent and deal with the low efficiency of exploration. We first introduce a sequence-based self-attention mechanism to generate the temporary features, which can effectively capture the temporal property of the task for the agent. During the training process, the temporary features are employed in each sampled episode to elaborate the intrinsic rewards, which is combined with the extrinsic reward to help the agent learn a feasible policy. Then we use the meta-gradient to update this intrinsic reward model in order that the agent can achieve better performance. Experiments are given to demonstrate the superiority of the proposed method.

KW - intrinsic motivation

KW - reinforcement learning

KW - self-attention

KW - sparse reward

UR - http://www.scopus.com/inward/record.url?scp=85128037803&partnerID=8YFLogxK

U2 - 10.1109/CAC53003.2021.9727314

DO - 10.1109/CAC53003.2021.9727314

M3 - Conference contribution

AN - SCOPUS:85128037803

T3 - Proceeding - 2021 China Automation Congress, CAC 2021

SP - 2022

EP - 2026

BT - Proceeding - 2021 China Automation Congress, CAC 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2021 China Automation Congress, CAC 2021

Y2 - 22 October 2021 through 24 October 2021

ER -

Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this