A multi-agent planning method on deep reinforcement learning for lunar rovers collaborated operation with uncertainty

Siyao Lu; Ai Gao; Rui Xu; Zhaoyu Li; Pan Huang; Chen Zhao

A multi-agent planning method on deep reinforcement learning for lunar rovers collaborated operation with uncertainty

Siyao Lu, Ai Gao, Rui Xu, Zhaoyu Li^*, Pan Huang, Chen Zhao

^*Corresponding author for this work

School of Aerospace Engineering

Research output: Contribution to journal › Conference article › peer-review

Abstract

International Lunar Research Station will be established around 2035 by China and Russia. At the beginning period, advanced lunar rovers equipped with robotic arms act as constructors of the station. Carrying rovers collaborate with each other to collect lunar soil and lunar water, then transport them to the mixing blender where construction materials are created. However, rovers start from different places, go towards different mining sites, drive through different paths, collect different kinds of resource, arrive to the blender at different time. These situations above make activity planning and path planning for operating rovers full of hard. What's more, obstacles are unknown for several reasons, such as lunar maps' resolution is not high enough for rovers to plan and see in advance and lunar solid leads rovers' speed hard to control as excepted, which leads to uncertainty for rovers. Therefore, we propose a new way of collaborated planning through multi-agent deep reinforcement learning, where simulation environments with randomly generated obstacles are established to train the rovers complete tasks with barriers avoided and decisions made according to circumstances. Actions for environments and tasks are previously trained into neural networks to change the mode of planning and implementing to instant decision making to avoid planning repair. First, to simulate the moon surface, we design a way of establishing the training environment where there're craters of different shape and size, obstacles of different size and location. Both the two kinds of barriers obstruct the path of rovers, and they need to bypass. Second, we propose an architecture on deep reinforcement learning to let rovers decide the next step instantaneously according to the surroundings. Because rovers are supported by battery, training is targeting to minimize the cost of power consumption or some other custom metrics. In our experiment, we intend that our way leads each rover finish the carrying work with no collision while consuming less energy and finishing the work quicklier than the traditional way of planning in advance and planning repairing during implementation.

Original language	English
Journal	Proceedings of the International Astronautical Congress, IAC
Volume	2022-September
Publication status	Published - 2022
Event	73rd International Astronautical Congress, IAC 2022 - Paris, France Duration: 18 Sept 2022 → 22 Sept 2022

Keywords

lunar rover
reinforcement learning
uncertainty

Cite this

@article{7714bd5b423f417d9c700887bd22c8fe,

title = "A multi-agent planning method on deep reinforcement learning for lunar rovers collaborated operation with uncertainty",

abstract = "International Lunar Research Station will be established around 2035 by China and Russia. At the beginning period, advanced lunar rovers equipped with robotic arms act as constructors of the station. Carrying rovers collaborate with each other to collect lunar soil and lunar water, then transport them to the mixing blender where construction materials are created. However, rovers start from different places, go towards different mining sites, drive through different paths, collect different kinds of resource, arrive to the blender at different time. These situations above make activity planning and path planning for operating rovers full of hard. What's more, obstacles are unknown for several reasons, such as lunar maps' resolution is not high enough for rovers to plan and see in advance and lunar solid leads rovers' speed hard to control as excepted, which leads to uncertainty for rovers. Therefore, we propose a new way of collaborated planning through multi-agent deep reinforcement learning, where simulation environments with randomly generated obstacles are established to train the rovers complete tasks with barriers avoided and decisions made according to circumstances. Actions for environments and tasks are previously trained into neural networks to change the mode of planning and implementing to instant decision making to avoid planning repair. First, to simulate the moon surface, we design a way of establishing the training environment where there're craters of different shape and size, obstacles of different size and location. Both the two kinds of barriers obstruct the path of rovers, and they need to bypass. Second, we propose an architecture on deep reinforcement learning to let rovers decide the next step instantaneously according to the surroundings. Because rovers are supported by battery, training is targeting to minimize the cost of power consumption or some other custom metrics. In our experiment, we intend that our way leads each rover finish the carrying work with no collision while consuming less energy and finishing the work quicklier than the traditional way of planning in advance and planning repairing during implementation.",

keywords = "lunar rover, reinforcement learning, uncertainty",

author = "Siyao Lu and Ai Gao and Rui Xu and Zhaoyu Li and Pan Huang and Chen Zhao",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 by the International Astronautical Federation (IAF). All rights reserved.; 73rd International Astronautical Congress, IAC 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

language = "English",

volume = "2022-September",

journal = "Proceedings of the International Astronautical Congress, IAC",

issn = "0074-1795",

publisher = "International Astronautical Federation, IAF",

}

TY - JOUR

T1 - A multi-agent planning method on deep reinforcement learning for lunar rovers collaborated operation with uncertainty

AU - Lu, Siyao

AU - Gao, Ai

AU - Xu, Rui

AU - Li, Zhaoyu

AU - Huang, Pan

AU - Zhao, Chen

PY - 2022

Y1 - 2022

N2 - International Lunar Research Station will be established around 2035 by China and Russia. At the beginning period, advanced lunar rovers equipped with robotic arms act as constructors of the station. Carrying rovers collaborate with each other to collect lunar soil and lunar water, then transport them to the mixing blender where construction materials are created. However, rovers start from different places, go towards different mining sites, drive through different paths, collect different kinds of resource, arrive to the blender at different time. These situations above make activity planning and path planning for operating rovers full of hard. What's more, obstacles are unknown for several reasons, such as lunar maps' resolution is not high enough for rovers to plan and see in advance and lunar solid leads rovers' speed hard to control as excepted, which leads to uncertainty for rovers. Therefore, we propose a new way of collaborated planning through multi-agent deep reinforcement learning, where simulation environments with randomly generated obstacles are established to train the rovers complete tasks with barriers avoided and decisions made according to circumstances. Actions for environments and tasks are previously trained into neural networks to change the mode of planning and implementing to instant decision making to avoid planning repair. First, to simulate the moon surface, we design a way of establishing the training environment where there're craters of different shape and size, obstacles of different size and location. Both the two kinds of barriers obstruct the path of rovers, and they need to bypass. Second, we propose an architecture on deep reinforcement learning to let rovers decide the next step instantaneously according to the surroundings. Because rovers are supported by battery, training is targeting to minimize the cost of power consumption or some other custom metrics. In our experiment, we intend that our way leads each rover finish the carrying work with no collision while consuming less energy and finishing the work quicklier than the traditional way of planning in advance and planning repairing during implementation.

AB - International Lunar Research Station will be established around 2035 by China and Russia. At the beginning period, advanced lunar rovers equipped with robotic arms act as constructors of the station. Carrying rovers collaborate with each other to collect lunar soil and lunar water, then transport them to the mixing blender where construction materials are created. However, rovers start from different places, go towards different mining sites, drive through different paths, collect different kinds of resource, arrive to the blender at different time. These situations above make activity planning and path planning for operating rovers full of hard. What's more, obstacles are unknown for several reasons, such as lunar maps' resolution is not high enough for rovers to plan and see in advance and lunar solid leads rovers' speed hard to control as excepted, which leads to uncertainty for rovers. Therefore, we propose a new way of collaborated planning through multi-agent deep reinforcement learning, where simulation environments with randomly generated obstacles are established to train the rovers complete tasks with barriers avoided and decisions made according to circumstances. Actions for environments and tasks are previously trained into neural networks to change the mode of planning and implementing to instant decision making to avoid planning repair. First, to simulate the moon surface, we design a way of establishing the training environment where there're craters of different shape and size, obstacles of different size and location. Both the two kinds of barriers obstruct the path of rovers, and they need to bypass. Second, we propose an architecture on deep reinforcement learning to let rovers decide the next step instantaneously according to the surroundings. Because rovers are supported by battery, training is targeting to minimize the cost of power consumption or some other custom metrics. In our experiment, we intend that our way leads each rover finish the carrying work with no collision while consuming less energy and finishing the work quicklier than the traditional way of planning in advance and planning repairing during implementation.

KW - lunar rover

KW - reinforcement learning

KW - uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85167578953&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85167578953

SN - 0074-1795

VL - 2022-September

JO - Proceedings of the International Astronautical Congress, IAC

JF - Proceedings of the International Astronautical Congress, IAC

T2 - 73rd International Astronautical Congress, IAC 2022

Y2 - 18 September 2022 through 22 September 2022

ER -

A multi-agent planning method on deep reinforcement learning for lunar rovers collaborated operation with uncertainty

Abstract

Keywords

Other files and links

Fingerprint

Cite this