Autonomous morphing strategy for a long-range aircraft using reinforcement learning

Baochao Zhang; Jie Guo; Haoning Wang; Shengjing Tang

doi:10.1016/j.ast.2024.109087

Autonomous morphing strategy for a long-range aircraft using reinforcement learning

Baochao Zhang, Jie Guo^*, Haoning Wang, Shengjing Tang

^*此作品的通讯作者

宇航学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

A novel morphing strategy based on reinforcement learning (RL) is developed to solve the morphing decision-making problem with minimum flight time for a long-range variable-sweep morphing aircraft. The proposed morphing strategy focuses on the sparse-reward no-reference decision-making problem caused by terminal performance objectives and long-range missions. A double-layer morphing-flight control framework is established to decouple the design of morphing strategy from flight controller while ensuring flight stability. Under this framework, an RL agent is designed to learn the minimum-flight-time morphing strategy. Specifically, the reward function is divided into primary goal rewards and sub-goal rewards to deal with the sparse-reward no-reference issue. A multi-stage progressive training scheme is developed to train the designed RL agent with a trail of training environments gradually converging to the actual world. This scheme accelerates the training process and promotes the convergence of the RL agent during training. Simulation results in nominal and dispersed conditions indicate the optimality and robustness of the proposed morphing strategy. Moreover, the generalization ability is validated in an untrained scenario.

源语言	英语
文章编号	109087
期刊	Aerospace Science and Technology
卷	148
DOI	https://doi.org/10.1016/j.ast.2024.109087
出版状态	已出版 - 5月 2024

访问文件

10.1016/j.ast.2024.109087

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{54b4f9b00b644416b9650d399804e8b5,

title = "Autonomous morphing strategy for a long-range aircraft using reinforcement learning",

abstract = "A novel morphing strategy based on reinforcement learning (RL) is developed to solve the morphing decision-making problem with minimum flight time for a long-range variable-sweep morphing aircraft. The proposed morphing strategy focuses on the sparse-reward no-reference decision-making problem caused by terminal performance objectives and long-range missions. A double-layer morphing-flight control framework is established to decouple the design of morphing strategy from flight controller while ensuring flight stability. Under this framework, an RL agent is designed to learn the minimum-flight-time morphing strategy. Specifically, the reward function is divided into primary goal rewards and sub-goal rewards to deal with the sparse-reward no-reference issue. A multi-stage progressive training scheme is developed to train the designed RL agent with a trail of training environments gradually converging to the actual world. This scheme accelerates the training process and promotes the convergence of the RL agent during training. Simulation results in nominal and dispersed conditions indicate the optimality and robustness of the proposed morphing strategy. Moreover, the generalization ability is validated in an untrained scenario.",

keywords = "Morphing aircraft, Morphing strategy, Progressive training, Reinforcement learning, Terminal performance",

author = "Baochao Zhang and Jie Guo and Haoning Wang and Shengjing Tang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Masson SAS",

year = "2024",

month = may,

doi = "10.1016/j.ast.2024.109087",

language = "English",

volume = "148",

journal = "Aerospace Science and Technology",

issn = "1270-9638",

publisher = "Elsevier Masson s.r.l.",

}

TY - JOUR

T1 - Autonomous morphing strategy for a long-range aircraft using reinforcement learning

AU - Zhang, Baochao

AU - Guo, Jie

AU - Wang, Haoning

AU - Tang, Shengjing

PY - 2024/5

Y1 - 2024/5

N2 - A novel morphing strategy based on reinforcement learning (RL) is developed to solve the morphing decision-making problem with minimum flight time for a long-range variable-sweep morphing aircraft. The proposed morphing strategy focuses on the sparse-reward no-reference decision-making problem caused by terminal performance objectives and long-range missions. A double-layer morphing-flight control framework is established to decouple the design of morphing strategy from flight controller while ensuring flight stability. Under this framework, an RL agent is designed to learn the minimum-flight-time morphing strategy. Specifically, the reward function is divided into primary goal rewards and sub-goal rewards to deal with the sparse-reward no-reference issue. A multi-stage progressive training scheme is developed to train the designed RL agent with a trail of training environments gradually converging to the actual world. This scheme accelerates the training process and promotes the convergence of the RL agent during training. Simulation results in nominal and dispersed conditions indicate the optimality and robustness of the proposed morphing strategy. Moreover, the generalization ability is validated in an untrained scenario.

AB - A novel morphing strategy based on reinforcement learning (RL) is developed to solve the morphing decision-making problem with minimum flight time for a long-range variable-sweep morphing aircraft. The proposed morphing strategy focuses on the sparse-reward no-reference decision-making problem caused by terminal performance objectives and long-range missions. A double-layer morphing-flight control framework is established to decouple the design of morphing strategy from flight controller while ensuring flight stability. Under this framework, an RL agent is designed to learn the minimum-flight-time morphing strategy. Specifically, the reward function is divided into primary goal rewards and sub-goal rewards to deal with the sparse-reward no-reference issue. A multi-stage progressive training scheme is developed to train the designed RL agent with a trail of training environments gradually converging to the actual world. This scheme accelerates the training process and promotes the convergence of the RL agent during training. Simulation results in nominal and dispersed conditions indicate the optimality and robustness of the proposed morphing strategy. Moreover, the generalization ability is validated in an untrained scenario.

KW - Morphing aircraft

KW - Morphing strategy

KW - Progressive training

KW - Reinforcement learning

KW - Terminal performance

UR - http://www.scopus.com/inward/record.url?scp=85189099285&partnerID=8YFLogxK

U2 - 10.1016/j.ast.2024.109087

DO - 10.1016/j.ast.2024.109087

M3 - Article

AN - SCOPUS:85189099285

SN - 1270-9638

VL - 148

JO - Aerospace Science and Technology

JF - Aerospace Science and Technology

M1 - 109087

ER -

Autonomous morphing strategy for a long-range aircraft using reinforcement learning

摘要

访问文件

其它文件与链接

指纹

引用此