Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

Siyao Lu; Rui Xu; Ai Gao; Zhaoyu Li; Jiamou Liu; Libo Zhang; Zhijun Zhao; Shengying Zhu; Yuqiong Li

doi:10.52202/078367-0056

Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

Siyao Lu, Rui Xu^*, Ai Gao, Zhaoyu Li, Jiamou Liu, Libo Zhang, Zhijun Zhao, Shengying Zhu, Yuqiong Li

^*Corresponding author for this work

School of Aerospace Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

Original language	English
Title of host publication	IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024
Publisher	International Astronautical Federation, IAF
Pages	526-533
Number of pages	8
ISBN (Electronic)	9798331312183
DOIs	https://doi.org/10.52202/078367-0056
Publication status	Published - 2024
Event	2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024 - Milan, Italy Duration: 14 Oct 2024 → 18 Oct 2024

Publication series

Name	Proceedings of the International Astronautical Congress, IAC
ISSN (Print)	0074-1795

Conference

Conference	2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024
Country/Territory	Italy
City	Milan
Period	14/10/24 → 18/10/24

Keywords

lunar rover
reinforcement learning
resource optimization
uncertainty

Access to Document

10.52202/078367-0056

Cite this

Lu, S., Xu, R., Gao, A., Li, Z., Liu, J., Zhang, L., Zhao, Z., Zhu, S., & Li, Y. (2024). Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. In IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024 (pp. 526-533). (Proceedings of the International Astronautical Congress, IAC). International Astronautical Federation, IAF. https://doi.org/10.52202/078367-0056

Lu, Siyao ; Xu, Rui ; Gao, Ai et al. / Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. International Astronautical Federation, IAF, 2024. pp. 526-533 (Proceedings of the International Astronautical Congress, IAC).

@inproceedings{dbf884bf5f8f46b3804fde8bf6dca548,

title = "Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers",

abstract = "The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What{\textquoteright}s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.",

keywords = "lunar rover, reinforcement learning, resource optimization, uncertainty",

author = "Siyao Lu and Rui Xu and Ai Gao and Zhaoyu Li and Jiamou Liu and Libo Zhang and Zhijun Zhao and Shengying Zhu and Yuqiong Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2024 by the International Astronautical Federation (IAF). All rights reserved.; 2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024 ; Conference date: 14-10-2024 Through 18-10-2024",

year = "2024",

doi = "10.52202/078367-0056",

language = "English",

series = "Proceedings of the International Astronautical Congress, IAC",

publisher = "International Astronautical Federation, IAF",

pages = "526--533",

booktitle = "IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024",

address = "France",

}

Lu, S, Xu, R, Gao, A, Li, Z, Liu, J, Zhang, L, Zhao, Z, Zhu, S & Li, Y 2024, Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. in IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. Proceedings of the International Astronautical Congress, IAC, International Astronautical Federation, IAF, pp. 526-533, 2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024, Milan, Italy, 14/10/24. https://doi.org/10.52202/078367-0056

Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. / Lu, Siyao; Xu, Rui; Gao, Ai et al.
IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. International Astronautical Federation, IAF, 2024. p. 526-533 (Proceedings of the International Astronautical Congress, IAC).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

AU - Lu, Siyao

AU - Xu, Rui

AU - Gao, Ai

AU - Li, Zhaoyu

AU - Liu, Jiamou

AU - Zhang, Libo

AU - Zhao, Zhijun

AU - Zhu, Shengying

AU - Li, Yuqiong

PY - 2024

Y1 - 2024

N2 - The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

AB - The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

KW - lunar rover

KW - reinforcement learning

KW - resource optimization

KW - uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85218453793&partnerID=8YFLogxK

U2 - 10.52202/078367-0056

DO - 10.52202/078367-0056

M3 - Conference contribution

AN - SCOPUS:85218453793

T3 - Proceedings of the International Astronautical Congress, IAC

SP - 526

EP - 533

BT - IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024

PB - International Astronautical Federation, IAF

T2 - 2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024

Y2 - 14 October 2024 through 18 October 2024

ER -

Lu S, Xu R, Gao A, Li Z, Liu J, Zhang L et al. Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. In IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. International Astronautical Federation, IAF. 2024. p. 526-533. (Proceedings of the International Astronautical Congress, IAC). doi: 10.52202/078367-0056

Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this