Reinforcement Learning for Quantization of Boundary Control Inputs: A Comparison of PPO-based Strategies

Yibo Wang*, Wen Kang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper investigates the boundary stabilization problem for the Korteweg-de Vries (KdV) system with quantized control inputs via the deep reinforcement learning (DRL) approach. To examine the impact of different placements of the quantizer on stabilization performance, we discuss two scenarios: the quantizer placed in the environment and in the agent. In the case of 'introducing the quantizer into the agent', we further explore two variations: optimizing the parameters of the discretized continuous distribution and directly optimizing the parameters of the discrete distribution. Finally, simulation results demonstrate that the proposed proximal policy optimization (PPO)-based strategies can train DRL controllers that effectively stabilize the target system, with the approach directly learning the parameters of the discrete distribution achieving the highest stabilization efficiency among the quantization-based scenarios.

Original languageEnglish
Title of host publicationProceedings of the 43rd Chinese Control Conference, CCC 2024
EditorsJing Na, Jian Sun
PublisherIEEE Computer Society
Pages1093-1098
Number of pages6
ISBN (Electronic)9789887581581
DOIs
Publication statusPublished - 2024
Event43rd Chinese Control Conference, CCC 2024 - Kunming, China
Duration: 28 Jul 202431 Jul 2024

Publication series

NameChinese Control Conference, CCC
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference43rd Chinese Control Conference, CCC 2024
Country/TerritoryChina
CityKunming
Period28/07/2431/07/24

Keywords

  • Boundary stabilization
  • Deep reinforcement learning
  • Input quantization
  • The nonlinear Korteweg-de Vries equation

Fingerprint

Dive into the research topics of 'Reinforcement Learning for Quantization of Boundary Control Inputs: A Comparison of PPO-based Strategies'. Together they form a unique fingerprint.

Cite this