TY - JOUR
T1 - Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
AU - Zang, Hongyu
AU - Li, Xin
AU - Zhang, Leiji
AU - Liu, Yang
AU - Sun, Baigui
AU - Islam, Riashat
AU - des Combes, Rémi Tachet
AU - Laroche, Romain
N1 - Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.
PY - 2023
Y1 - 2023
N2 - While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.
AB - While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.
UR - http://www.scopus.com/inward/record.url?scp=85191163860&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85191163860
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
Y2 - 10 December 2023 through 16 December 2023
ER -