Random curiosity-driven exploration in deep reinforcement learning

Jing Li; Xinxin Shi; Jiehao Li; Xin Zhang; Junzheng Wang

doi:10.1016/j.neucom.2020.08.024

Random curiosity-driven exploration in deep reinforcement learning

Jing Li, Xinxin Shi, Jiehao Li^*, Xin Zhang, Junzheng Wang

^*此作品的通讯作者

自动化学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

61 引用（Scopus）

摘要

Reinforcement learning (RL) depends on carefully engineering environment rewards. However, rewards from environments are extremely sparse for many RL tasks, challenging for the agent to learn skills and interact with the environment. One solution to this problem is to create intrinsic rewards for agents and to make rewards dense and more suitable for learning. Recent algorithms, such as curiosity-driven exploration, usually estimate the novelty of the next state through the prediction error of dynamics models. However, these methods are typically limited by the capacity of their dynamics models. In this paper, a random curiosity-driven model using deep reinforcement learning is proposed, which uses a target network with fixed weights to maintain the stability of dynamics models and create more suitable intrinsic rewards. We integrate the parametric exploration method for further promoting sufficient exploration. Besides, a deeper and more closely connected network is utilized for encoding the pixel images for policy-gradient. By comparing our method against the previous approaches in several environments, the experiments show that our method achieves state-of-the-art performance on most but not all of the Atari games.

源语言	英语
页（从-至）	139-147
页数	9
期刊	Neurocomputing
卷	418
DOI	https://doi.org/10.1016/j.neucom.2020.08.024
出版状态	已出版 - 22 12月 2020

访问文件

10.1016/j.neucom.2020.08.024

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f46c43a50b2743e0a83254de4b51d5b5,

title = "Random curiosity-driven exploration in deep reinforcement learning",

abstract = "Reinforcement learning (RL) depends on carefully engineering environment rewards. However, rewards from environments are extremely sparse for many RL tasks, challenging for the agent to learn skills and interact with the environment. One solution to this problem is to create intrinsic rewards for agents and to make rewards dense and more suitable for learning. Recent algorithms, such as curiosity-driven exploration, usually estimate the novelty of the next state through the prediction error of dynamics models. However, these methods are typically limited by the capacity of their dynamics models. In this paper, a random curiosity-driven model using deep reinforcement learning is proposed, which uses a target network with fixed weights to maintain the stability of dynamics models and create more suitable intrinsic rewards. We integrate the parametric exploration method for further promoting sufficient exploration. Besides, a deeper and more closely connected network is utilized for encoding the pixel images for policy-gradient. By comparing our method against the previous approaches in several environments, the experiments show that our method achieves state-of-the-art performance on most but not all of the Atari games.",

keywords = "Curiosity-driven exploration, Deep reinforcement learning, Intrinsic rewards",

author = "Jing Li and Xinxin Shi and Jiehao Li and Xin Zhang and Junzheng Wang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = dec,

day = "22",

doi = "10.1016/j.neucom.2020.08.024",

language = "English",

volume = "418",

pages = "139--147",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Random curiosity-driven exploration in deep reinforcement learning

AU - Li, Jing

AU - Shi, Xinxin

AU - Li, Jiehao

AU - Zhang, Xin

AU - Wang, Junzheng

PY - 2020/12/22

Y1 - 2020/12/22

N2 - Reinforcement learning (RL) depends on carefully engineering environment rewards. However, rewards from environments are extremely sparse for many RL tasks, challenging for the agent to learn skills and interact with the environment. One solution to this problem is to create intrinsic rewards for agents and to make rewards dense and more suitable for learning. Recent algorithms, such as curiosity-driven exploration, usually estimate the novelty of the next state through the prediction error of dynamics models. However, these methods are typically limited by the capacity of their dynamics models. In this paper, a random curiosity-driven model using deep reinforcement learning is proposed, which uses a target network with fixed weights to maintain the stability of dynamics models and create more suitable intrinsic rewards. We integrate the parametric exploration method for further promoting sufficient exploration. Besides, a deeper and more closely connected network is utilized for encoding the pixel images for policy-gradient. By comparing our method against the previous approaches in several environments, the experiments show that our method achieves state-of-the-art performance on most but not all of the Atari games.

AB - Reinforcement learning (RL) depends on carefully engineering environment rewards. However, rewards from environments are extremely sparse for many RL tasks, challenging for the agent to learn skills and interact with the environment. One solution to this problem is to create intrinsic rewards for agents and to make rewards dense and more suitable for learning. Recent algorithms, such as curiosity-driven exploration, usually estimate the novelty of the next state through the prediction error of dynamics models. However, these methods are typically limited by the capacity of their dynamics models. In this paper, a random curiosity-driven model using deep reinforcement learning is proposed, which uses a target network with fixed weights to maintain the stability of dynamics models and create more suitable intrinsic rewards. We integrate the parametric exploration method for further promoting sufficient exploration. Besides, a deeper and more closely connected network is utilized for encoding the pixel images for policy-gradient. By comparing our method against the previous approaches in several environments, the experiments show that our method achieves state-of-the-art performance on most but not all of the Atari games.

KW - Curiosity-driven exploration

KW - Deep reinforcement learning

KW - Intrinsic rewards

UR - http://www.scopus.com/inward/record.url?scp=85092115902&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.08.024

DO - 10.1016/j.neucom.2020.08.024

M3 - Article

AN - SCOPUS:85092115902

SN - 0925-2312

VL - 418

SP - 139

EP - 147

JO - Neurocomputing

JF - Neurocomputing

ER -

Random curiosity-driven exploration in deep reinforcement learning

摘要

访问文件

其它文件与链接

指纹

引用此