Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

Shukai Liu; Chenming Wu; Ying Li; Liangjun Zhang

doi:10.1109/IROS55552.2023.10341990

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

Shukai Liu, Chenming Wu^*, Ying Li, Liangjun Zhang

^*此作品的通讯作者

机械与车辆学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by humans negatively impacting the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method for robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.

源语言	英语
主期刊名	2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	7561-7567
页数	7
ISBN（电子版）	9781665491907
DOI	https://doi.org/10.1109/IROS55552.2023.10341990
出版状态	已出版 - 2023
活动	2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023 - Detroit, 美国期限: 1 10月 2023 → 5 10月 2023

出版系列

姓名	IEEE International Conference on Intelligent Robots and Systems
ISSN（印刷版）	2153-0858
ISSN（电子版）	2153-0866

会议

会议	2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
国家/地区	美国
市	Detroit
时期	1/10/23 → 5/10/23

访问文件

10.1109/IROS55552.2023.10341990

其它文件与链接

链接到 Scopus 的出版物

引用此

Liu, S., Wu, C., Li, Y., & Zhang, L. (2023). Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores. 在 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023 (页码 7561-7567). (IEEE International Conference on Intelligent Robots and Systems). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IROS55552.2023.10341990

Liu, Shukai ; Wu, Chenming ; Li, Ying 等. / Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 7561-7567 (IEEE International Conference on Intelligent Robots and Systems).

@inproceedings{6f038ffe3cbe4b7ea8858fc2bee44923,

title = "Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores",

abstract = "Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by humans negatively impacting the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method for robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.",

author = "Shukai Liu and Chenming Wu and Ying Li and Liangjun Zhang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023 ; Conference date: 01-10-2023 Through 05-10-2023",

year = "2023",

doi = "10.1109/IROS55552.2023.10341990",

language = "English",

series = "IEEE International Conference on Intelligent Robots and Systems",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "7561--7567",

booktitle = "2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023",

address = "United States",

}

Liu, S, Wu, C, Li, Y & Zhang, L 2023, Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores. 在 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023. IEEE International Conference on Intelligent Robots and Systems, Institute of Electrical and Electronics Engineers Inc., 页码 7561-7567, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023, Detroit, 美国, 1/10/23. https://doi.org/10.1109/IROS55552.2023.10341990

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores. / Liu, Shukai; Wu, Chenming; Li, Ying 等.
2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 7561-7567 (IEEE International Conference on Intelligent Robots and Systems).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

AU - Liu, Shukai

AU - Wu, Chenming

AU - Li, Ying

AU - Zhang, Liangjun

PY - 2023

Y1 - 2023

N2 - Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by humans negatively impacting the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method for robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.

AB - Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by humans negatively impacting the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method for robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.

UR - http://www.scopus.com/inward/record.url?scp=85182525943&partnerID=8YFLogxK

U2 - 10.1109/IROS55552.2023.10341990

DO - 10.1109/IROS55552.2023.10341990

M3 - Conference contribution

AN - SCOPUS:85182525943

T3 - IEEE International Conference on Intelligent Robots and Systems

SP - 7561

EP - 7567

BT - 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023

Y2 - 1 October 2023 through 5 October 2023

ER -

Liu S, Wu C, Li Y, Zhang L. Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores. 在 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 7561-7567. (IEEE International Conference on Intelligent Robots and Systems). doi: 10.1109/IROS55552.2023.10341990

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此