Energy management optimization for connected hybrid electric vehicle using offline reinforcement learning

Hongwen He*, Zegong Niu, Yong Wang, Ruchen Huang, Yiwen Shou

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)

Abstract

Energy management strategy (EMS) is critical to ensure the long-term energy economy of hybrid electric vehicles. The classical deep reinforcement learning algorithms exist many issues such as the safety constraint and the simulation-to-real gap, resulting in difficulties in applications to industrial tasks. Thus, this paper proposes a novel EMS and a policy updating method based on the offline deep reinforcement learning algorithm to address the energy optimization problem. Firstly, the batch-constrained deep Q-learning algorithm is applied to provide a solution training the control strategy based on the existing datasets without interaction with the environment. Secondly, the EMS updating method is proposed to improve the adaptability under complicated driving cycles. The Jensen–Shannon divergence is introduced to determine when offline control strategy updates in real time. Finally, the optimality and effectiveness of the proposed EMS are validated, and the real-time and adaptive performance of the proposed method is also verified. The results indicate that the proposed EMS can learn a superior policy from fixed data, and the proposed updating method can utilize the real-time data to update the offline policy approaching the online deep reinforcement learning-based strategy in energy consumption.

Original languageEnglish
Article number108517
JournalJournal of Energy Storage
Volume72
DOIs
Publication statusPublished - 25 Nov 2023

Keywords

  • Energy management strategy (EMS)
  • Hybrid electric vehicle (HEV)
  • Jensen–Shannon divergence
  • Offline deep reinforcement learning (DRL)

Fingerprint

Dive into the research topics of 'Energy management optimization for connected hybrid electric vehicle using offline reinforcement learning'. Together they form a unique fingerprint.

Cite this