Screening goals and selecting policies in hierarchical reinforcement learning

Junyan Zhou; Jing Chen; Yanfeng Tong; Junrui Zhang

doi:10.1007/s10489-021-03093-9

Screening goals and selecting policies in hierarchical reinforcement learning

Junyan Zhou, Jing Chen^*, Yanfeng Tong, Junrui Zhang

^*此作品的通讯作者

光电学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Hierarchical Reinforcement Learning (HRL) is primarily proposed for addressing problems with sparse reward signals and a long time horizon. Many existing HRL algorithms use neural networks to automatically produce goals, which have not taken into account that not all goals advance the task. Some methods have addressed the optimization of goal generation, while the goal is represented by the specific value of the state. In this paper, we propose a novel HRL algorithm for automatically discovering goals, which solves the problem for the optimization of goal generation in the latent state space by screening goals and selecting policies. We compare our approach with the state-of-the-art algorithms on Atari 2600 games and the results show that it can speed up learning and improve performance.

源语言	英语
页（从-至）	18049-18060
页数	12
期刊	Applied Intelligence
卷	52
期	15
DOI	https://doi.org/10.1007/s10489-021-03093-9
出版状态	已出版 - 12月 2022

访问文件

10.1007/s10489-021-03093-9

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6acfa23c978e4941b54adde155b86a7c,

title = "Screening goals and selecting policies in hierarchical reinforcement learning",

abstract = "Hierarchical Reinforcement Learning (HRL) is primarily proposed for addressing problems with sparse reward signals and a long time horizon. Many existing HRL algorithms use neural networks to automatically produce goals, which have not taken into account that not all goals advance the task. Some methods have addressed the optimization of goal generation, while the goal is represented by the specific value of the state. In this paper, we propose a novel HRL algorithm for automatically discovering goals, which solves the problem for the optimization of goal generation in the latent state space by screening goals and selecting policies. We compare our approach with the state-of-the-art algorithms on Atari 2600 games and the results show that it can speed up learning and improve performance.",

keywords = "Automatic goal discovery, Goal screening, Hierarchical reinforcement learning, Policy selection",

author = "Junyan Zhou and Jing Chen and Yanfeng Tong and Junrui Zhang",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2022",

month = dec,

doi = "10.1007/s10489-021-03093-9",

language = "English",

volume = "52",

pages = "18049--18060",

journal = "Applied Intelligence",

issn = "0924-669X",

publisher = "Springer Netherlands",

number = "15",

}

TY - JOUR

T1 - Screening goals and selecting policies in hierarchical reinforcement learning

AU - Zhou, Junyan

AU - Chen, Jing

AU - Tong, Yanfeng

AU - Zhang, Junrui

PY - 2022/12

Y1 - 2022/12

N2 - Hierarchical Reinforcement Learning (HRL) is primarily proposed for addressing problems with sparse reward signals and a long time horizon. Many existing HRL algorithms use neural networks to automatically produce goals, which have not taken into account that not all goals advance the task. Some methods have addressed the optimization of goal generation, while the goal is represented by the specific value of the state. In this paper, we propose a novel HRL algorithm for automatically discovering goals, which solves the problem for the optimization of goal generation in the latent state space by screening goals and selecting policies. We compare our approach with the state-of-the-art algorithms on Atari 2600 games and the results show that it can speed up learning and improve performance.

AB - Hierarchical Reinforcement Learning (HRL) is primarily proposed for addressing problems with sparse reward signals and a long time horizon. Many existing HRL algorithms use neural networks to automatically produce goals, which have not taken into account that not all goals advance the task. Some methods have addressed the optimization of goal generation, while the goal is represented by the specific value of the state. In this paper, we propose a novel HRL algorithm for automatically discovering goals, which solves the problem for the optimization of goal generation in the latent state space by screening goals and selecting policies. We compare our approach with the state-of-the-art algorithms on Atari 2600 games and the results show that it can speed up learning and improve performance.

KW - Automatic goal discovery

KW - Goal screening

KW - Hierarchical reinforcement learning

KW - Policy selection

UR - http://www.scopus.com/inward/record.url?scp=85127675888&partnerID=8YFLogxK

U2 - 10.1007/s10489-021-03093-9

DO - 10.1007/s10489-021-03093-9

M3 - Article

AN - SCOPUS:85127675888

SN - 0924-669X

VL - 52

SP - 18049

EP - 18060

JO - Applied Intelligence

JF - Applied Intelligence

IS - 15

ER -

Screening goals and selecting policies in hierarchical reinforcement learning

摘要

访问文件

其它文件与链接

指纹

引用此