TY - JOUR
T1 - Energy management in HDHEV with dual APUs
T2 - Enhancing soft actor-critic using clustered experience replay and multi-dimensional priority sampling
AU - Zhang, Dongfang
AU - Sun, Wei
AU - Zou, Yuan
AU - Zhang, Xudong
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/3/15
Y1 - 2025/3/15
N2 - Traditional experience sampling methods in reinforcement learning often overlook sample diversity, which limits learning effectiveness. This research proposes an Enhanced Soft Actor-Critic (ESAC) algorithm for energy management in Heavy-Duty Hybrid Electric Vehicles equipped with dual Auxiliary Power Units. ESAC addresses the limitations of existing methods by integrating multi-dimensional evaluation metrics and the BIRCH clustering algorithm for online experience sampling. The proposed approach optimizes performance in complex multi-power source systems, ensuring diverse sample selection and enhancing learning capacity. Comparative analyses of ESAC against TD3, SAC, and SAC-BIRCH-PER demonstrate that ESAC achieves superior convergence performance, with a nearly 10-episode faster convergence rate than Prioritized Experience Replay. Additionally, ESAC shows significant reductions in fuel consumption—up to 5.32 % compared to the dynamic programming benchmark—outperforming SAC and TD3 by 10.54 % and 8.84 %, respectively. These results highlight that enhancing data diversity and prioritization not only stabilizes learning but also optimizes fuel efficiency in low-speed, high-torque conditions, thereby providing a robust solution for real-world energy management challenges.
AB - Traditional experience sampling methods in reinforcement learning often overlook sample diversity, which limits learning effectiveness. This research proposes an Enhanced Soft Actor-Critic (ESAC) algorithm for energy management in Heavy-Duty Hybrid Electric Vehicles equipped with dual Auxiliary Power Units. ESAC addresses the limitations of existing methods by integrating multi-dimensional evaluation metrics and the BIRCH clustering algorithm for online experience sampling. The proposed approach optimizes performance in complex multi-power source systems, ensuring diverse sample selection and enhancing learning capacity. Comparative analyses of ESAC against TD3, SAC, and SAC-BIRCH-PER demonstrate that ESAC achieves superior convergence performance, with a nearly 10-episode faster convergence rate than Prioritized Experience Replay. Additionally, ESAC shows significant reductions in fuel consumption—up to 5.32 % compared to the dynamic programming benchmark—outperforming SAC and TD3 by 10.54 % and 8.84 %, respectively. These results highlight that enhancing data diversity and prioritization not only stabilizes learning but also optimizes fuel efficiency in low-speed, high-torque conditions, thereby providing a robust solution for real-world energy management challenges.
KW - BIRCH algorithm
KW - Energy management strategy
KW - Heavy-duty hybrid electric vehicles
KW - Multi-dimensions priority
KW - Soft actor-critic
UR - http://www.scopus.com/inward/record.url?scp=85217980253&partnerID=8YFLogxK
U2 - 10.1016/j.energy.2025.134926
DO - 10.1016/j.energy.2025.134926
M3 - Article
AN - SCOPUS:85217980253
SN - 0360-5442
VL - 319
JO - Energy
JF - Energy
M1 - 134926
ER -