Energy management in HDHEV with dual APUs: Enhancing soft actor-critic using clustered experience replay and multi-dimensional priority sampling

Dongfang Zhang, Wei Sun, Yuan Zou*, Xudong Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Traditional experience sampling methods in reinforcement learning often overlook sample diversity, which limits learning effectiveness. This research proposes an Enhanced Soft Actor-Critic (ESAC) algorithm for energy management in Heavy-Duty Hybrid Electric Vehicles equipped with dual Auxiliary Power Units. ESAC addresses the limitations of existing methods by integrating multi-dimensional evaluation metrics and the BIRCH clustering algorithm for online experience sampling. The proposed approach optimizes performance in complex multi-power source systems, ensuring diverse sample selection and enhancing learning capacity. Comparative analyses of ESAC against TD3, SAC, and SAC-BIRCH-PER demonstrate that ESAC achieves superior convergence performance, with a nearly 10-episode faster convergence rate than Prioritized Experience Replay. Additionally, ESAC shows significant reductions in fuel consumption—up to 5.32 % compared to the dynamic programming benchmark—outperforming SAC and TD3 by 10.54 % and 8.84 %, respectively. These results highlight that enhancing data diversity and prioritization not only stabilizes learning but also optimizes fuel efficiency in low-speed, high-torque conditions, thereby providing a robust solution for real-world energy management challenges.

Original languageEnglish
Article number134926
JournalEnergy
Volume319
DOIs
Publication statusPublished - 15 Mar 2025

Keywords

  • BIRCH algorithm
  • Energy management strategy
  • Heavy-duty hybrid electric vehicles
  • Multi-dimensions priority
  • Soft actor-critic

Fingerprint

Dive into the research topics of 'Energy management in HDHEV with dual APUs: Enhancing soft actor-critic using clustered experience replay and multi-dimensional priority sampling'. Together they form a unique fingerprint.

Cite this