Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

Shukai Liu, Chenming Wu*, Ying Li, Liangjun Zhang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by humans negatively impacting the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method for robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.

源语言英语
主期刊名2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
出版商Institute of Electrical and Electronics Engineers Inc.
7561-7567
页数7
ISBN(电子版)9781665491907
DOI
出版状态已出版 - 2023
活动2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023 - Detroit, 美国
期限: 1 10月 20235 10月 2023

出版系列

姓名IEEE International Conference on Intelligent Robots and Systems
ISSN(印刷版)2153-0858
ISSN(电子版)2153-0866

会议

会议2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
国家/地区美国
Detroit
时期1/10/235/10/23

指纹

探究 'Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores' 的科研主题。它们共同构成独一无二的指纹。

引用此