A multi-agent planning method on deep reinforcement learning for lunar rovers collaborated operation with uncertainty

Siyao Lu, Ai Gao, Rui Xu, Zhaoyu Li*, Pan Huang, Chen Zhao

*此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

摘要

International Lunar Research Station will be established around 2035 by China and Russia. At the beginning period, advanced lunar rovers equipped with robotic arms act as constructors of the station. Carrying rovers collaborate with each other to collect lunar soil and lunar water, then transport them to the mixing blender where construction materials are created. However, rovers start from different places, go towards different mining sites, drive through different paths, collect different kinds of resource, arrive to the blender at different time. These situations above make activity planning and path planning for operating rovers full of hard. What's more, obstacles are unknown for several reasons, such as lunar maps' resolution is not high enough for rovers to plan and see in advance and lunar solid leads rovers' speed hard to control as excepted, which leads to uncertainty for rovers. Therefore, we propose a new way of collaborated planning through multi-agent deep reinforcement learning, where simulation environments with randomly generated obstacles are established to train the rovers complete tasks with barriers avoided and decisions made according to circumstances. Actions for environments and tasks are previously trained into neural networks to change the mode of planning and implementing to instant decision making to avoid planning repair. First, to simulate the moon surface, we design a way of establishing the training environment where there're craters of different shape and size, obstacles of different size and location. Both the two kinds of barriers obstruct the path of rovers, and they need to bypass. Second, we propose an architecture on deep reinforcement learning to let rovers decide the next step instantaneously according to the surroundings. Because rovers are supported by battery, training is targeting to minimize the cost of power consumption or some other custom metrics. In our experiment, we intend that our way leads each rover finish the carrying work with no collision while consuming less energy and finishing the work quicklier than the traditional way of planning in advance and planning repairing during implementation.

源语言英语
期刊Proceedings of the International Astronautical Congress, IAC
2022-September
出版状态已出版 - 2022
活动73rd International Astronautical Congress, IAC 2022 - Paris, 法国
期限: 18 9月 202222 9月 2022

指纹

探究 'A multi-agent planning method on deep reinforcement learning for lunar rovers collaborated operation with uncertainty' 的科研主题。它们共同构成独一无二的指纹。

引用此