TY - JOUR
T1 - Impulsive maneuver strategy for multi-agent orbital pursuit-evasion game under sparse rewards
AU - Wang, Hongbo
AU - Zhang, Yao
N1 - Publisher Copyright:
© 2024
PY - 2024/12
Y1 - 2024/12
N2 - To address the subjectivity of dense reward designs for the orbital pursuit-evasion game with multiple optimization objectives, this paper proposes the reinforcement learning method with a hierarchical network structure to guide game strategies under sparse rewards. Initially, to overcome the convergence challenges in the reinforcement learning training process under sparse rewards, a hierarchical network structure is proposed based on the hindsight experience replay. Subsequently, considering the strict constraints imposed by orbital dynamics on spacecraft state space, the reachable domain method is introduced to refine the subgoal space in the hierarchical network, further facilitating the achievement of subgoals. Finally, by adopting the centralized training-layered execution approach, a complete multi-agent reinforcement learning method with the hierarchical network structure is established, enabling networks at each level to learn effectively in parallel within sparse reward environments. Numerical simulations indicate that, under the single-agent reinforcement learning framework, the proposed method exhibits superior stability in the late training stage and enhances exploration efficiency in the early stage by 38.89% to 55.56% to the baseline method. Under the multi-agent reinforcement learning framework, as the relative distance decreases, the subgoals generated by the hierarchical network transition from long-term to short-term, aligning with human behavioral logic.
AB - To address the subjectivity of dense reward designs for the orbital pursuit-evasion game with multiple optimization objectives, this paper proposes the reinforcement learning method with a hierarchical network structure to guide game strategies under sparse rewards. Initially, to overcome the convergence challenges in the reinforcement learning training process under sparse rewards, a hierarchical network structure is proposed based on the hindsight experience replay. Subsequently, considering the strict constraints imposed by orbital dynamics on spacecraft state space, the reachable domain method is introduced to refine the subgoal space in the hierarchical network, further facilitating the achievement of subgoals. Finally, by adopting the centralized training-layered execution approach, a complete multi-agent reinforcement learning method with the hierarchical network structure is established, enabling networks at each level to learn effectively in parallel within sparse reward environments. Numerical simulations indicate that, under the single-agent reinforcement learning framework, the proposed method exhibits superior stability in the late training stage and enhances exploration efficiency in the early stage by 38.89% to 55.56% to the baseline method. Under the multi-agent reinforcement learning framework, as the relative distance decreases, the subgoals generated by the hierarchical network transition from long-term to short-term, aligning with human behavioral logic.
KW - Hierarchical network
KW - Hindsight experience
KW - Impulsive thrust
KW - Orbital pursuit-evasion game
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85205697552&partnerID=8YFLogxK
U2 - 10.1016/j.ast.2024.109618
DO - 10.1016/j.ast.2024.109618
M3 - Article
AN - SCOPUS:85205697552
SN - 1270-9638
VL - 155
JO - Aerospace Science and Technology
JF - Aerospace Science and Technology
M1 - 109618
ER -