TY - JOUR
T1 - Maximum entropy deep inverse reinforcement learning-based energy management strategy for hybrid electric logistics trucks
AU - Lu, Qizhe
AU - Fang, Jiayi
AU - Yang, Chao
AU - Tang, Wenbin
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/11/30
Y1 - 2025/11/30
N2 - Considering external environmental factors in the energy management strategy (EMS) is vital for hybrid electric logistics trucks. In this regard, in contrast to traditional research focusing on modeling and analyzing one or several external factors, this paper directly investigates the underlying mechanism by which related state quantities influence energy allocation. A deep reinforcement learning-based EMS is proposed to treat the strongly coupled human–environment–vehicle system from a novel perspective, where all external environmental factors serve as inputs to the model, directly influencing the decision-making of the energy management module through complex internal couplings. First, expert actions are obtained through dynamic programming (DP) to generate a dataset of optimal state–action pairs to serve as reference training samples. Subsequently, a maximum entropy deep inverse reinforcement learning framework is developed to uncover the latent decision-making mechanisms of the expert policy. Within this framework, a reward function network based on gated recurrent units is designed to process sequential state information. The expert policy is modeled as a truncated Gaussian distribution to provide a soft learning target. The Kullback–Leibler divergence is minimized between the soft policy distribution induced by the reward function and the expert distribution, ensuring the approximate accuracy of the learned reward-guided policy while maintaining sufficient stochasticity and exploratory capability. Finally, by integrating multidimensional state information from the information layer, the trained reward function is used to achieve efficient energy allocation. The simulation results indicate that the proposed EMS closely approximates the global optimal solution of DP compared to existing approaches.
AB - Considering external environmental factors in the energy management strategy (EMS) is vital for hybrid electric logistics trucks. In this regard, in contrast to traditional research focusing on modeling and analyzing one or several external factors, this paper directly investigates the underlying mechanism by which related state quantities influence energy allocation. A deep reinforcement learning-based EMS is proposed to treat the strongly coupled human–environment–vehicle system from a novel perspective, where all external environmental factors serve as inputs to the model, directly influencing the decision-making of the energy management module through complex internal couplings. First, expert actions are obtained through dynamic programming (DP) to generate a dataset of optimal state–action pairs to serve as reference training samples. Subsequently, a maximum entropy deep inverse reinforcement learning framework is developed to uncover the latent decision-making mechanisms of the expert policy. Within this framework, a reward function network based on gated recurrent units is designed to process sequential state information. The expert policy is modeled as a truncated Gaussian distribution to provide a soft learning target. The Kullback–Leibler divergence is minimized between the soft policy distribution induced by the reward function and the expert distribution, ensuring the approximate accuracy of the learned reward-guided policy while maintaining sufficient stochasticity and exploratory capability. Finally, by integrating multidimensional state information from the information layer, the trained reward function is used to achieve efficient energy allocation. The simulation results indicate that the proposed EMS closely approximates the global optimal solution of DP compared to existing approaches.
KW - Cyber–physical systems
KW - Energy management strategy
KW - Hybrid electric logistics truck
KW - Maximum entropy inverse reinforcement learning
UR - https://www.scopus.com/pages/publications/105018857926
U2 - 10.1016/j.energy.2025.138905
DO - 10.1016/j.energy.2025.138905
M3 - Article
AN - SCOPUS:105018857926
SN - 0360-5442
VL - 338
JO - Energy
JF - Energy
M1 - 138905
ER -