Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

Siyao Lu; Rui Xu; Ai Gao; Zhaoyu Li; Jiamou Liu; Libo Zhang; Zhijun Zhao; Shengying Zhu; Yuqiong Li

doi:10.52202/078367-0056

Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

Siyao Lu, Rui Xu^*, Ai Gao, Zhaoyu Li, Jiamou Liu, Libo Zhang, Zhijun Zhao, Shengying Zhu, Yuqiong Li

^*此作品的通讯作者

宇航学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

源语言	英语
主期刊名	IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024
出版商	International Astronautical Federation, IAF
页	526-533
页数	8
ISBN（电子版）	9798331312183
DOI	https://doi.org/10.52202/078367-0056
出版状态	已出版 - 2024
活动	2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024 - Milan, 意大利期限: 14 10月 2024 → 18 10月 2024

出版系列

姓名	Proceedings of the International Astronautical Congress, IAC
ISSN（印刷版）	0074-1795

会议

会议	2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024
国家/地区	意大利
市	Milan
时期	14/10/24 → 18/10/24

访问文件

10.52202/078367-0056

其它文件与链接

链接到 Scopus 的出版物

引用此

Lu, S., Xu, R., Gao, A., Li, Z., Liu, J., Zhang, L., Zhao, Z., Zhu, S., & Li, Y. (2024). Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. 在 IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024 (页码 526-533). (Proceedings of the International Astronautical Congress, IAC). International Astronautical Federation, IAF. https://doi.org/10.52202/078367-0056

Lu, Siyao ; Xu, Rui ; Gao, Ai 等. / Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. International Astronautical Federation, IAF, 2024. 页码 526-533 (Proceedings of the International Astronautical Congress, IAC).

@inproceedings{dbf884bf5f8f46b3804fde8bf6dca548,

title = "Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers",

abstract = "The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What{\textquoteright}s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.",

keywords = "lunar rover, reinforcement learning, resource optimization, uncertainty",

author = "Siyao Lu and Rui Xu and Ai Gao and Zhaoyu Li and Jiamou Liu and Libo Zhang and Zhijun Zhao and Shengying Zhu and Yuqiong Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2024 by the International Astronautical Federation (IAF). All rights reserved.; 2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024 ; Conference date: 14-10-2024 Through 18-10-2024",

year = "2024",

doi = "10.52202/078367-0056",

language = "English",

series = "Proceedings of the International Astronautical Congress, IAC",

publisher = "International Astronautical Federation, IAF",

pages = "526--533",

booktitle = "IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024",

address = "France",

}

Lu, S, Xu, R, Gao, A, Li, Z, Liu, J, Zhang, L, Zhao, Z, Zhu, S & Li, Y 2024, Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. 在 IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. Proceedings of the International Astronautical Congress, IAC, International Astronautical Federation, IAF, 页码 526-533, 2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024, Milan, 意大利, 14/10/24. https://doi.org/10.52202/078367-0056

Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. / Lu, Siyao; Xu, Rui; Gao, Ai 等.
IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. International Astronautical Federation, IAF, 2024. 页码 526-533 (Proceedings of the International Astronautical Congress, IAC).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

AU - Lu, Siyao

AU - Xu, Rui

AU - Gao, Ai

AU - Li, Zhaoyu

AU - Liu, Jiamou

AU - Zhang, Libo

AU - Zhao, Zhijun

AU - Zhu, Shengying

AU - Li, Yuqiong

PY - 2024

Y1 - 2024

N2 - The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

AB - The International Lunar Research Station will be established near the south pole through advanced unmanned rovers at the beginning period. The south pole of the moon has short daytime, so the efficiency of remote control is inadequate. However, the duration and power resource usage of the lunar rover moving on the lunar surface remains uncertain because of different loading weight of collection and changes of terrain in moving. What’s more, a lunar rover needs to move back to the base before nighttime without sunlight to provide energy, while the whole time of working on the moon also needs optimization. We select to solve the planning problem with reinforcement learning (RL) due to its capability in tackling uncertainty and optimization. However, traditional reinforcement learning cannot guarantee safety with time uncertainty, resource uncertainty, and constraints due to the soft constraints in optimization. Therefore, we propose a new way through safe reinforcement learning of task planning and resource collection optimization among tasks with uncertain duration and resource collection. We consider a scenario of in-situ material utilization for the lunar base, where there are tasks of moving, charging, collecting, material delivering, and material receiving, all of which have uncertain duration in execution and every task must be done during the daytime except the charging. Resource collection is related to power consumption in moving so it will be decided according to the remaining power. We further propose an architecture on reinforcement learning to let rovers decide the next step instantaneously according to the expected task duration, the remaining time, and the remaining power. Maximizing the amount of material delivered is an optimization target in training while keeping the rovers safe to work only in the daytime without an empty battery. In our experiment, we intend that our way works well in the uncertainties, and it will lead the rover to finish tasks with less power consumption than plan traditionally, and long-term experiments illustrate that the rover will always be safe and move to charge before nighttime comes even with plans generated step-by-step.

KW - lunar rover

KW - reinforcement learning

KW - resource optimization

KW - uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85218453793&partnerID=8YFLogxK

U2 - 10.52202/078367-0056

DO - 10.52202/078367-0056

M3 - Conference contribution

AN - SCOPUS:85218453793

T3 - Proceedings of the International Astronautical Congress, IAC

SP - 526

EP - 533

BT - IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024

PB - International Astronautical Federation, IAF

T2 - 2024 IAF Space Operations Symposium at the 75th International Astronautical Congress, IAC 2024

Y2 - 14 October 2024 through 18 October 2024

ER -

Lu S, Xu R, Gao A, Li Z, Liu J, Zhang L 等. Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers. 在 IAF Space Operations Symposium - Held at the 75th International Astronautical Congress, IAC 2024. International Astronautical Federation, IAF. 2024. 页码 526-533. (Proceedings of the International Astronautical Congress, IAC). doi: 10.52202/078367-0056

Safe reinforcement learning task planning with uncertain duration and resource consumption in limited daytime for lunar rovers

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此