Deep reinforcement learning based planning method in state space for lunar rovers

Ai Gao; Siyao Lu; Rui Xu; Zhaoyu Li; Bang Wang; Shengying Zhu; Yuhui Gao; Bo Pan

doi:10.1016/j.engappai.2023.107287

Deep reinforcement learning based planning method in state space for lunar rovers

Ai Gao, Siyao Lu, Rui Xu^*, Zhaoyu Li, Bang Wang, Shengying Zhu, Yuhui Gao, Bo Pan

^*此作品的通讯作者

宇航学院

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

The unmanned lunar rover is essential for lunar exploration and construction. Executing environment differ from what humans get since communication needs time from Earth to Moon. Considering possible discrepancies between the pre-considered environment from the planner and the real environment for sampling tasks on the moon, a planner that generates short plans quickly should be used. Therefore, a planner for both standard and emergency planning based on deep reinforcement learning (DRL) is demonstrated in this paper. This planner can create a full-range plan 13.5 times faster than the traditional planner on complex problems or 10.1 times faster while controlling the rover step-by-step in the state space. Based on a specific moon sampling scenario, we propose a tracking reward guiding the rover searching in the states in the deep reinforcement learning architecture which is presented and created by a state space representation by matrix, randomly available training state pairs and the plans generated by a custom breadth-first search (BFS) planner for the tracking reward. The BFS planner obtains a custom state hash algorithm and a preparation to train state pairs for safety and flexibility. Tests on training and planning are performed to validate the effectiveness, robustness and customization of the proposed method in a planning domain with multiple rovers. Our model can handle three kinds of emergencies, even if they occur frequently. The success rate is beyond the state-of-the-art model. While facing emergencies, the average response time of our model is 324 times faster than the classical planner.

源语言	英语
文章编号	107287
期刊	Engineering Applications of Artificial Intelligence
卷	127
DOI	https://doi.org/10.1016/j.engappai.2023.107287
出版状态	已出版 - 1月 2024

访问文件

10.1016/j.engappai.2023.107287

其它文件与链接

链接到 Scopus 的出版物

引用此

Gao, A., Lu, S., Xu, R., Li, Z., Wang, B., Zhu, S., Gao, Y., & Pan, B. (2024). Deep reinforcement learning based planning method in state space for lunar rovers. Engineering Applications of Artificial Intelligence, 127, 文章 107287. https://doi.org/10.1016/j.engappai.2023.107287

@article{1f8dabd4fdab4d7d83b5158e49aed2b9,

title = "Deep reinforcement learning based planning method in state space for lunar rovers",

abstract = "The unmanned lunar rover is essential for lunar exploration and construction. Executing environment differ from what humans get since communication needs time from Earth to Moon. Considering possible discrepancies between the pre-considered environment from the planner and the real environment for sampling tasks on the moon, a planner that generates short plans quickly should be used. Therefore, a planner for both standard and emergency planning based on deep reinforcement learning (DRL) is demonstrated in this paper. This planner can create a full-range plan 13.5 times faster than the traditional planner on complex problems or 10.1 times faster while controlling the rover step-by-step in the state space. Based on a specific moon sampling scenario, we propose a tracking reward guiding the rover searching in the states in the deep reinforcement learning architecture which is presented and created by a state space representation by matrix, randomly available training state pairs and the plans generated by a custom breadth-first search (BFS) planner for the tracking reward. The BFS planner obtains a custom state hash algorithm and a preparation to train state pairs for safety and flexibility. Tests on training and planning are performed to validate the effectiveness, robustness and customization of the proposed method in a planning domain with multiple rovers. Our model can handle three kinds of emergencies, even if they occur frequently. The success rate is beyond the state-of-the-art model. While facing emergencies, the average response time of our model is 324 times faster than the classical planner.",

keywords = "Adaptive planner, Automated planning, Deep reinforcement learning, Lunar rover",

author = "Ai Gao and Siyao Lu and Rui Xu and Zhaoyu Li and Bang Wang and Shengying Zhu and Yuhui Gao and Bo Pan",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Ltd",

year = "2024",

month = jan,

doi = "10.1016/j.engappai.2023.107287",

language = "English",

volume = "127",

journal = "Engineering Applications of Artificial Intelligence",

issn = "0952-1976",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Deep reinforcement learning based planning method in state space for lunar rovers

AU - Gao, Ai

AU - Lu, Siyao

AU - Xu, Rui

AU - Li, Zhaoyu

AU - Wang, Bang

AU - Zhu, Shengying

AU - Gao, Yuhui

AU - Pan, Bo

PY - 2024/1

Y1 - 2024/1

N2 - The unmanned lunar rover is essential for lunar exploration and construction. Executing environment differ from what humans get since communication needs time from Earth to Moon. Considering possible discrepancies between the pre-considered environment from the planner and the real environment for sampling tasks on the moon, a planner that generates short plans quickly should be used. Therefore, a planner for both standard and emergency planning based on deep reinforcement learning (DRL) is demonstrated in this paper. This planner can create a full-range plan 13.5 times faster than the traditional planner on complex problems or 10.1 times faster while controlling the rover step-by-step in the state space. Based on a specific moon sampling scenario, we propose a tracking reward guiding the rover searching in the states in the deep reinforcement learning architecture which is presented and created by a state space representation by matrix, randomly available training state pairs and the plans generated by a custom breadth-first search (BFS) planner for the tracking reward. The BFS planner obtains a custom state hash algorithm and a preparation to train state pairs for safety and flexibility. Tests on training and planning are performed to validate the effectiveness, robustness and customization of the proposed method in a planning domain with multiple rovers. Our model can handle three kinds of emergencies, even if they occur frequently. The success rate is beyond the state-of-the-art model. While facing emergencies, the average response time of our model is 324 times faster than the classical planner.

AB - The unmanned lunar rover is essential for lunar exploration and construction. Executing environment differ from what humans get since communication needs time from Earth to Moon. Considering possible discrepancies between the pre-considered environment from the planner and the real environment for sampling tasks on the moon, a planner that generates short plans quickly should be used. Therefore, a planner for both standard and emergency planning based on deep reinforcement learning (DRL) is demonstrated in this paper. This planner can create a full-range plan 13.5 times faster than the traditional planner on complex problems or 10.1 times faster while controlling the rover step-by-step in the state space. Based on a specific moon sampling scenario, we propose a tracking reward guiding the rover searching in the states in the deep reinforcement learning architecture which is presented and created by a state space representation by matrix, randomly available training state pairs and the plans generated by a custom breadth-first search (BFS) planner for the tracking reward. The BFS planner obtains a custom state hash algorithm and a preparation to train state pairs for safety and flexibility. Tests on training and planning are performed to validate the effectiveness, robustness and customization of the proposed method in a planning domain with multiple rovers. Our model can handle three kinds of emergencies, even if they occur frequently. The success rate is beyond the state-of-the-art model. While facing emergencies, the average response time of our model is 324 times faster than the classical planner.

KW - Adaptive planner

KW - Automated planning

KW - Deep reinforcement learning

KW - Lunar rover

UR - http://www.scopus.com/inward/record.url?scp=85175237470&partnerID=8YFLogxK

U2 - 10.1016/j.engappai.2023.107287

DO - 10.1016/j.engappai.2023.107287

M3 - Article

AN - SCOPUS:85175237470

SN - 0952-1976

VL - 127

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

M1 - 107287

ER -

Deep reinforcement learning based planning method in state space for lunar rovers

摘要

访问文件

其它文件与链接

指纹

引用此