TY - JOUR
T1 - Robust Low-Thrust Trajectory Design for Interplanetary Spaceflight
T2 - An Adaptive Latent Reinforcement Learning Method
AU - Gao, Han
AU - Lin, Yanghui
AU - Sun, Zhongqi
AU - Cui, Bing
AU - Zhang, Guangchen
AU - Xia, Yuanqing
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2026
Y1 - 2026
N2 - This article investigates the problem of robust trajectory design for low-thrust spacecraft subject to state and observation uncertainties. An adaptive latent reinforcement learning (RL) scheme based on sequential latent variable models (SLVMs) is proposed to address this issue. First, an SLVM is employed for the representation learning of uncertain environments and for predicting future observations. Subsequently, by integrating representation learning based on the SLVM with proximal policy optimization (PPO), a stochastic latent PPO (SLPPO) scheme is introduced. Distinct from existing methods, the control policy is derived from learned stochastic latent variables rather than raw uncertain observations, which effectively mitigates the adverse impact of uncertainties on control performance. Furthermore, to enhance training efficiency, an improved dense reward shaping mechanism is designed based on the observation predictions from the SLVM and adaptive techniques. Finally, numerical simulations of two rendezvous missions validate the effectiveness of the proposed approach.
AB - This article investigates the problem of robust trajectory design for low-thrust spacecraft subject to state and observation uncertainties. An adaptive latent reinforcement learning (RL) scheme based on sequential latent variable models (SLVMs) is proposed to address this issue. First, an SLVM is employed for the representation learning of uncertain environments and for predicting future observations. Subsequently, by integrating representation learning based on the SLVM with proximal policy optimization (PPO), a stochastic latent PPO (SLPPO) scheme is introduced. Distinct from existing methods, the control policy is derived from learned stochastic latent variables rather than raw uncertain observations, which effectively mitigates the adverse impact of uncertainties on control performance. Furthermore, to enhance training efficiency, an improved dense reward shaping mechanism is designed based on the observation predictions from the SLVM and adaptive techniques. Finally, numerical simulations of two rendezvous missions validate the effectiveness of the proposed approach.
KW - Latent variable model
KW - reinforcement learning (RL)
KW - reward function
KW - robust trajectory optimization
UR - https://www.scopus.com/pages/publications/105027462092
U2 - 10.1109/TCYB.2026.3651240
DO - 10.1109/TCYB.2026.3651240
M3 - Article
AN - SCOPUS:105027462092
SN - 2168-2267
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
ER -