Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

Siyao Lu; Rui Xu; Dengyun Yu; Zhaoyu Li; Ai Gao; Bang Wang; Bo Pan

Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

Siyao Lu, Rui Xu, Dengyun Yu, Zhaoyu Li^*, Ai Gao, Bang Wang, Bo Pan

^*Corresponding author for this work

School of Aerospace Engineering

Research output: Contribution to journal › Conference article › peer-review

Abstract

China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.

Original language	English
Journal	Proceedings of the International Astronautical Congress, IAC
Volume	2023-October
Publication status	Published - 2023
Event	74th International Astronautical Congress, IAC 2023 - Baku, Azerbaijan Duration: 2 Oct 2023 → 6 Oct 2023

Keywords

hierarchical reinforcement learning
lunar rover
uncertainty

Cite this

Lu, S., Xu, R., Yu, D., Li, Z., Gao, A., Wang, B., & Pan, B. (2023). Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers. Proceedings of the International Astronautical Congress, IAC, 2023-October.

@article{b8bbc22de33b47b898bfe82179b202d7,

title = "Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers",

abstract = "China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.",

keywords = "hierarchical reinforcement learning, lunar rover, uncertainty",

author = "Siyao Lu and Rui Xu and Dengyun Yu and Zhaoyu Li and Ai Gao and Bang Wang and Bo Pan",

note = "Publisher Copyright: Copyright {\textcopyright} 2023 by the International Astronautical Federation (IAF). All rights reserved.; 74th International Astronautical Congress, IAC 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

language = "English",

volume = "2023-October",

journal = "Proceedings of the International Astronautical Congress, IAC",

issn = "0074-1795",

publisher = "International Astronautical Federation, IAF",

}

TY - JOUR

T1 - Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

AU - Lu, Siyao

AU - Xu, Rui

AU - Yu, Dengyun

AU - Li, Zhaoyu

AU - Gao, Ai

AU - Wang, Bang

AU - Pan, Bo

PY - 2023

Y1 - 2023

N2 - China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.

AB - China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.

KW - hierarchical reinforcement learning

KW - lunar rover

KW - uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85187988411&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85187988411

SN - 0074-1795

VL - 2023-October

JO - Proceedings of the International Astronautical Congress, IAC

JF - Proceedings of the International Astronautical Congress, IAC

T2 - 74th International Astronautical Congress, IAC 2023

Y2 - 2 October 2023 through 6 October 2023

ER -

Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

Abstract

Keywords

Other files and links

Fingerprint

Cite this