含动力学奖励的航天器编队深度强化学习控制

Translated title of the contribution: Deep Reinforcement Learning Control for Spacecraft Formation With Dynamical Reward
  • Wei Cheng Jin
  • , Ti Chen
  • , Hai Yan Hu*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper presents a deep reinforcement learning control method for spacecraft formation. The method deals with the dynamical feasibility of the trajectory and optimizes the fuel consumption by introducing dynamical reward. Based on proximal policy optimization algorithm, a dynamic model of relative motion with J2 perturbation is introduced in the training environment, and the inputs of Actor and Critic networks are the local observed information of the spacecraft. The outputs of the Actor network are the desired position and velocity of the spacecraft. Combining the dynamic model that restricts the control of transitions between two arbitrary actions of the strategy, the Actor network outputs the desired position and velocity, which makes the output trajectory account for the dynamical feasibility. The Critic network estimates the advantage function constrained by the dynamic model based on local observed information, therefore, the Actor network updates the parameters based on the advantage function. Further, the dynamical reward is defined as the negative value of the fuel consumption. As a result, combining collision avoidance and task-related rewards, the obtained Actor network achieves the distributed spacecraft formation task while optimizing the fuel consumption.

Translated title of the contributionDeep Reinforcement Learning Control for Spacecraft Formation With Dynamical Reward
Original languageChinese (Traditional)
Pages (from-to)2283-2292
Number of pages10
JournalZidonghua Xuebao/Acta Automatica Sinica
Volume51
Issue number10
DOIs
Publication statusPublished - Oct 2025
Externally publishedYes

Fingerprint

Dive into the research topics of 'Deep Reinforcement Learning Control for Spacecraft Formation With Dynamical Reward'. Together they form a unique fingerprint.

Cite this