Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Hongyu Zang; Xin Li; Leiji Zhang; Yang Liu; Baigui Sun; Riashat Islam; Rémi Tachet des Combes; Romain Laroche

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Hongyu Zang, Xin Li^*, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Rémi Tachet des Combes, Romain Laroche

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Conference article › peer-review

1 Citation (Scopus)

Abstract

While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.

Original language	English
Journal	Advances in Neural Information Processing Systems
Volume	36
Publication status	Published - 2023
Event	37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: 10 Dec 2023 → 16 Dec 2023

Cite this

@article{aaafad247e614a668b90fa332f32abd3,

title = "Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning",

abstract = "While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.",

author = "Hongyu Zang and Xin Li and Leiji Zhang and Yang Liu and Baigui Sun and Riashat Islam and {des Combes}, {R{\'e}mi Tachet} and Romain Laroche",

note = "Publisher Copyright: {\textcopyright} 2023 Neural information processing systems foundation. All rights reserved.; 37th Conference on Neural Information Processing Systems, NeurIPS 2023 ; Conference date: 10-12-2023 Through 16-12-2023",

year = "2023",

language = "English",

volume = "36",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

publisher = "Neural information processing systems foundation",

}

TY - JOUR

T1 - Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

AU - Zang, Hongyu

AU - Li, Xin

AU - Zhang, Leiji

AU - Liu, Yang

AU - Sun, Baigui

AU - Islam, Riashat

AU - des Combes, Rémi Tachet

AU - Laroche, Romain

PY - 2023

Y1 - 2023

N2 - While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.

AB - While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par.In some instances, their performance has even significantly underperformed alternative methods.We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks.Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation.We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce.Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data.Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space.We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL.Codes are provided at https://github.com/zanghyu/Offline_Bisimulation.

UR - http://www.scopus.com/inward/record.url?scp=85191163860&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85191163860

SN - 1049-5258

VL - 36

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023

Y2 - 10 December 2023 through 16 December 2023

ER -

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Abstract

Other files and links

Fingerprint

Cite this