Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Zhuo Jiang, Daiying Tian, Qingkai Yang, Zhihong Peng*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

This paper proposes a self-attention based temporal intrinsic reward model for reinforcement learning (RL), to synthesize the control policy for the agent constrained by the sparse reward in partially observable environments. This approach can solve the problem of temporal credit assignment to some extent and deal with the low efficiency of exploration. We first introduce a sequence-based self-attention mechanism to generate the temporary features, which can effectively capture the temporal property of the task for the agent. During the training process, the temporary features are employed in each sampled episode to elaborate the intrinsic rewards, which is combined with the extrinsic reward to help the agent learn a feasible policy. Then we use the meta-gradient to update this intrinsic reward model in order that the agent can achieve better performance. Experiments are given to demonstrate the superiority of the proposed method.

源语言英语
主期刊名Proceeding - 2021 China Automation Congress, CAC 2021
出版商Institute of Electrical and Electronics Engineers Inc.
2022-2026
页数5
ISBN(电子版)9781665426473
DOI
出版状态已出版 - 2021
活动2021 China Automation Congress, CAC 2021 - Beijing, 中国
期限: 22 10月 202124 10月 2021

出版系列

姓名Proceeding - 2021 China Automation Congress, CAC 2021

会议

会议2021 China Automation Congress, CAC 2021
国家/地区中国
Beijing
时期22/10/2124/10/21

指纹

探究 'Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此