Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Zhuo Jiang, Daiying Tian, Qingkai Yang, Zhihong Peng*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a self-attention based temporal intrinsic reward model for reinforcement learning (RL), to synthesize the control policy for the agent constrained by the sparse reward in partially observable environments. This approach can solve the problem of temporal credit assignment to some extent and deal with the low efficiency of exploration. We first introduce a sequence-based self-attention mechanism to generate the temporary features, which can effectively capture the temporal property of the task for the agent. During the training process, the temporary features are employed in each sampled episode to elaborate the intrinsic rewards, which is combined with the extrinsic reward to help the agent learn a feasible policy. Then we use the meta-gradient to update this intrinsic reward model in order that the agent can achieve better performance. Experiments are given to demonstrate the superiority of the proposed method.

Original languageEnglish
Title of host publicationProceeding - 2021 China Automation Congress, CAC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2022-2026
Number of pages5
ISBN (Electronic)9781665426473
DOIs
Publication statusPublished - 2021
Event2021 China Automation Congress, CAC 2021 - Beijing, China
Duration: 22 Oct 202124 Oct 2021

Publication series

NameProceeding - 2021 China Automation Congress, CAC 2021

Conference

Conference2021 China Automation Congress, CAC 2021
Country/TerritoryChina
CityBeijing
Period22/10/2124/10/21

Keywords

  • intrinsic motivation
  • reinforcement learning
  • self-attention
  • sparse reward

Fingerprint

Dive into the research topics of 'Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning'. Together they form a unique fingerprint.

Cite this