Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

Siyao Lu, Rui Xu*, Ai Gao, Zhaoyu Li, Jiamou Liu, Libo Zhang, Zhijun Zhao, Shengying Zhu, Yuqiong Li

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

Original languageEnglish
Title of host publicationIAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024
PublisherInternational Astronautical Federation, IAF
Pages526-533
Number of pages8
ISBN (Electronic)9798331312183
DOIs
Publication statusPublished - 2024
Event2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024 - Milan, Italy
Duration: 14 Oct 202418 Oct 2024

Publication series

NameProceedings of the International Astronautical Congress, IAC
ISSN (Print)0074-1795

Conference

Conference2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024
Country/TerritoryItaly
CityMilan
Period14/10/2418/10/24

Keywords

  • lunar rover
  • reinforcement learning
  • resource optimization
  • uncertainty

Fingerprint

Dive into the research topics of 'Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers'. Together they form a unique fingerprint.

Cite this

Lu, S., Xu, R., Gao, A., Li, Z., Liu, J., Zhang, L., Zhao, Z., Zhu, S., & Li, Y. (2024). Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. In IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024 (pp. 526-533). (Proceedings of the International Astronautical Congress, IAC). International Astronautical Federation, IAF. https://doi.org/10.52202/078367-0056