Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Hongyu Zang; Xin Li; Leiji Zhang; Yang Liu; Baigui Sun; Riashat Islam; Rémi Tachet des Combes; Romain Laroche

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Hongyu Zang, Xin Li^*, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Rémi Tachet des Combes, Romain Laroche

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.

源语言	英语
期刊	Advances in Neural Information Processing Systems
卷	36
出版状态	已出版 - 2023
活动	37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, 美国期限: 10 12月 2023 → 16 12月 2023

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{aaafad247e614a668b90fa332f32abd3,

title = "Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning",

abstract = "While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.",

author = "Hongyu Zang and Xin Li and Leiji Zhang and Yang Liu and Baigui Sun and Riashat Islam and {des Combes}, {R{\'e}mi Tachet} and Romain Laroche",

note = "Publisher Copyright: {\textcopyright} 2023 Neural information processing systems foundation. All rights reserved.; 37th Conference on Neural Information Processing Systems, NeurIPS 2023 ; Conference date: 10-12-2023 Through 16-12-2023",

year = "2023",

language = "English",

volume = "36",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

publisher = "Neural information processing systems foundation",

}

TY - JOUR

T1 - Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

AU - Zang, Hongyu

AU - Li, Xin

AU - Zhang, Leiji

AU - Liu, Yang

AU - Sun, Baigui

AU - Islam, Riashat

AU - des Combes, Rémi Tachet

AU - Laroche, Romain

PY - 2023

Y1 - 2023

N2 - While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.

AB - While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.

UR - http://www.scopus.com/inward/record.url?scp=85191163860&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85191163860

SN - 1049-5258

VL - 36

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023

Y2 - 10 December 2023 through 16 December 2023

ER -

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

摘要

其它文件与链接

指纹

引用此