Abstract
This paper presents a deep reinforcement learning control method for spacecraft formation. The method deals with the dynamical feasibility of the trajectory and optimizes the fuel consumption by introducing dynamical reward. Based on proximal policy optimization algorithm, a dynamic model of relative motion with J2 perturbation is introduced in the training environment, and the inputs of Actor and Critic networks are the local observed information of the spacecraft. The outputs of the Actor network are the desired position and velocity of the spacecraft. Combining the dynamic model that restricts the control of transitions between two arbitrary actions of the strategy, the Actor network outputs the desired position and velocity, which makes the output trajectory account for the dynamical feasibility. The Critic network estimates the advantage function constrained by the dynamic model based on local observed information, therefore, the Actor network updates the parameters based on the advantage function. Further, the dynamical reward is defined as the negative value of the fuel consumption. As a result, combining collision avoidance and task-related rewards, the obtained Actor network achieves the distributed spacecraft formation task while optimizing the fuel consumption.
| Translated title of the contribution | Deep Reinforcement Learning Control for Spacecraft Formation With Dynamical Reward |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 2283-2292 |
| Number of pages | 10 |
| Journal | Zidonghua Xuebao/Acta Automatica Sinica |
| Volume | 51 |
| Issue number | 10 |
| DOIs | |
| Publication status | Published - Oct 2025 |
| Externally published | Yes |