TY - GEN
T1 - Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers
AU - Lu, Siyao
AU - Xu, Rui
AU - Gao, Ai
AU - Li, Zhaoyu
AU - Liu, Jiamou
AU - Zhang, Libo
AU - Zhao, Zhijun
AU - Zhu, Shengying
AU - Li, Yuqiong
N1 - Publisher Copyright:
Copyright © 2024 by the International Astronautical Federation (IAF). All rights reserved.
PY - 2024
Y1 - 2024
N2 - The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.
AB - The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.
KW - lunar rover
KW - reinforcement learning
KW - resource optimization
KW - uncertainty
UR - http://www.scopus.com/inward/record.url?scp=85218453793&partnerID=8YFLogxK
U2 - 10.52202/078367-0056
DO - 10.52202/078367-0056
M3 - Conference contribution
AN - SCOPUS:85218453793
T3 - Proceedings of the International Astronautical Congress, IAC
SP - 526
EP - 533
BT - IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024
PB - International Astronautical Federation, IAF
T2 - 2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024
Y2 - 14 October 2024 through 18 October 2024
ER -