Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

Siyao Lu, Rui Xu*, Ai Gao, Zhaoyu Li, Jiamou Liu, Libo Zhang, Zhijun Zhao, Shengying Zhu, Yuqiong Li

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

源语言英语
主期刊名IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024
出版商International Astronautical Federation, IAF
526-533
页数8
ISBN(电子版)9798331312183
DOI
出版状态已出版 - 2024
活动2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024 - Milan, 意大利
期限: 14 10月 202418 10月 2024

出版系列

姓名Proceedings of the International Astronautical Congress, IAC
ISSN(印刷版)0074-1795

会议

会议2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024
国家/地区意大利
Milan
时期14/10/2418/10/24

指纹

探究 'Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers' 的科研主题。它们共同构成独一无二的指纹。

引用此

Lu, S., Xu, R., Gao, A., Li, Z., Liu, J., Zhang, L., Zhao, Z., Zhu, S., & Li, Y. (2024). Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. 在 IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024 (页码 526-533). (Proceedings of the International Astronautical Congress, IAC). International Astronautical Federation, IAF. https://doi.org/10.52202/078367-0056