跳到主要导航 跳到搜索 跳到主要内容

BEHAVIOR PRIOR REPRESENTATION LEARNING FOR OFFLINE REINFORCEMENT LEARNING

  • Hongyu Zang
  • , Xin Li*
  • , Jie Yu
  • , Chen Liu
  • , Riahsat Islam
  • , Rémi Tachet des Combes
  • , Romain Laroche
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Mila-Québec AI Institute HEC
  • Microsoft USA

科研成果: 会议稿件论文同行评审

摘要

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at https://github.com/bit1029public/offline_bpr.

源语言英语
出版状态已出版 - 2023
活动11th International Conference on Learning Representations, ICLR 2023 - Kigali, 卢旺达
期限: 1 5月 20235 5月 2023

会议

会议11th International Conference on Learning Representations, ICLR 2023
国家/地区卢旺达
Kigali
时期1/05/235/05/23

指纹

探究 'BEHAVIOR PRIOR REPRESENTATION LEARNING FOR OFFLINE REINFORCEMENT LEARNING' 的科研主题。它们共同构成独一无二的指纹。

引用此