TY - GEN
T1 - HILPS
T2 - 16th IEEE International Conference on Control, Automation, Robotics and Vision, ICARCV 2020
AU - Wen, Mingxing
AU - Yue, Yufeng
AU - Wu, Zhenyu
AU - Mihankhan, Ehsan
AU - Wang, Danwei
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/13
Y1 - 2020/12/13
N2 - Reinforcement learning has obtained increasing attention in mobile robot mapless navigation in recent years. However, there are still some obvious challenges including the sample efficiency, safety due to dilemma of exploration and exploitation. These problems are addressed in this paper by proposing the Human-in-Loop Policy Search (HILPS) framework, where learning from demonstration, learning from human intervention and Near Optimal Policy strategies are integrated together. Firstly, the former two make sure that expert experience grant mobile robot a more informative and correct decision for accomplishing the task and also maintaining the safety of the mobile robot due to the priority of human control. Then the Near Optimal Policy (NOP) provides a way to selectively store the similar experience with respect to the preexisting human demonstration, in which case the sample efficiency can be improved by eliminating exclusively exploratory behaviors. To verify the performance of the algorithm, the mobile robot navigation experiments are extensively conducted in simulation and real world. Results show that HILPS can improve sample efficiency and safety in comparison to state-of-art reinforcement learning.
AB - Reinforcement learning has obtained increasing attention in mobile robot mapless navigation in recent years. However, there are still some obvious challenges including the sample efficiency, safety due to dilemma of exploration and exploitation. These problems are addressed in this paper by proposing the Human-in-Loop Policy Search (HILPS) framework, where learning from demonstration, learning from human intervention and Near Optimal Policy strategies are integrated together. Firstly, the former two make sure that expert experience grant mobile robot a more informative and correct decision for accomplishing the task and also maintaining the safety of the mobile robot due to the priority of human control. Then the Near Optimal Policy (NOP) provides a way to selectively store the similar experience with respect to the preexisting human demonstration, in which case the sample efficiency can be improved by eliminating exclusively exploratory behaviors. To verify the performance of the algorithm, the mobile robot navigation experiments are extensively conducted in simulation and real world. Results show that HILPS can improve sample efficiency and safety in comparison to state-of-art reinforcement learning.
UR - http://www.scopus.com/inward/record.url?scp=85100109886&partnerID=8YFLogxK
U2 - 10.1109/ICARCV50220.2020.9305366
DO - 10.1109/ICARCV50220.2020.9305366
M3 - Conference contribution
AN - SCOPUS:85100109886
T3 - 16th IEEE International Conference on Control, Automation, Robotics and Vision, ICARCV 2020
SP - 387
EP - 392
BT - 16th IEEE International Conference on Control, Automation, Robotics and Vision, ICARCV 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 December 2020 through 15 December 2020
ER -