Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

Siyao Lu; Rui Xu; Dengyun Yu; Zhaoyu Li; Ai Gao; Bang Wang; Bo Pan

Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

Siyao Lu, Rui Xu, Dengyun Yu, Zhaoyu Li^*, Ai Gao, Bang Wang, Bo Pan

^*此作品的通讯作者

宇航学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

摘要

China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.

源语言	英语
期刊	Proceedings of the International Astronautical Congress, IAC
卷	2023-October
出版状态	已出版 - 2023
活动	74th International Astronautical Congress, IAC 2023 - Baku, 阿塞拜疆期限: 2 10月 2023 → 6 10月 2023

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b8bbc22de33b47b898bfe82179b202d7,

title = "Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers",

abstract = "China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.",

keywords = "hierarchical reinforcement learning, lunar rover, uncertainty",

author = "Siyao Lu and Rui Xu and Dengyun Yu and Zhaoyu Li and Ai Gao and Bang Wang and Bo Pan",

note = "Publisher Copyright: Copyright {\textcopyright} 2023 by the International Astronautical Federation (IAF). All rights reserved.; 74th International Astronautical Congress, IAC 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

language = "English",

volume = "2023-October",

journal = "Proceedings of the International Astronautical Congress, IAC",

issn = "0074-1795",

publisher = "International Astronautical Federation, IAF",

}

TY - JOUR

T1 - Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

AU - Lu, Siyao

AU - Xu, Rui

AU - Yu, Dengyun

AU - Li, Zhaoyu

AU - Gao, Ai

AU - Wang, Bang

AU - Pan, Bo

PY - 2023

Y1 - 2023

N2 - China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.

AB - China and Russia will establish the International Lunar Research Station together in around 2035 when the resource acquisition from the lunar surface and the construction of lunar bases will be applied at the beginning period. However, the accuracy of the lunar surface digital elevation map (DEM) is not enough, which cannot meet the needs of path planning or acting for lunar rovers. What's more, there are limitations to the vision of each rover compared to the wide moon surface, so rovers are required to foresee the obstacles and adjust movements for obstacle avoidance, power saving, and safety. Another problem is that the acquisition and construction are long-term tasks so pure path planning methods won't work properly. Therefore, we propose a new way of planning both the path and the task by hierarchical reinforcement learning, where hundreds of simulation environments in which the obstacles and places for acquisition, charging, blending, and construction are varied. Rovers can only obtain a vision of several meters and they will only know the approximate locations of targets. So uncertainty occurs during the rovers' way to the targets, on which there are small and large obstacles. Targets will be given by the task level from which the guidance will be applied on the path level. However, data on the task level generated by the hierarchical environment is not enough for training the task policies so pre-generated data will be prepared for the pre-training of the task policies then the policies will be set on the task level with constraints while updating rather than joint training from the beginning. In our experiment, we intend that our way leads each rover to finish the long-term tasks without meeting large obstacles, trains the whole hierarchal policy more quickly than the traditional way, and generates a better result than pure path planning in the uncertainty environment for long-term tasks.

KW - hierarchical reinforcement learning

KW - lunar rover

KW - uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85187988411&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85187988411

SN - 0074-1795

VL - 2023-October

JO - Proceedings of the International Astronautical Congress, IAC

JF - Proceedings of the International Astronautical Congress, IAC

T2 - 74th International Astronautical Congress, IAC 2023

Y2 - 2 October 2023 through 6 October 2023

ER -

Hierarchical reinforcement learning based planning method with uncertainty in limited visions for lunar rovers

摘要

其它文件与链接

指纹

引用此