BEHAVIOR PRIOR REPRESENTATION LEARNING FOR OFFLINE REINFORCEMENT LEARNING

Hongyu Zang; Xin Li; Jie Yu; Chen Liu; Riahsat Islam; Rémi Tachet des Combes; Romain Laroche

BEHAVIOR PRIOR REPRESENTATION LEARNING FOR OFFLINE REINFORCEMENT LEARNING

Hongyu Zang, Xin Li^*, Jie Yu, Chen Liu, Riahsat Islam, Rémi Tachet des Combes, Romain Laroche

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to conference › Paper › peer-review

5 Citations (Scopus)

Abstract

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at https://github.com/bit1029public/offline_bpr.

Original language	English
Publication status	Published - 2023
Event	11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: 1 May 2023 → 5 May 2023

Conference

Conference	11th International Conference on Learning Representations, ICLR 2023
Country/Territory	Rwanda
City	Kigali
Period	1/05/23 → 5/05/23

Cite this

@conference{300bc44f854d4437b86f8a5294399653,

title = "BEHAVIOR PRIOR REPRESENTATION LEARNING FOR OFFLINE REINFORCEMENT LEARNING",

abstract = "Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at https://github.com/bit1029public/offline_bpr.",

author = "Hongyu Zang and Xin Li and Jie Yu and Chen Liu and Riahsat Islam and {des Combes}, {R{\'e}mi Tachet} and Romain Laroche",

note = "Publisher Copyright: {\textcopyright} 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.; 11th International Conference on Learning Representations, ICLR 2023 ; Conference date: 01-05-2023 Through 05-05-2023",

year = "2023",

language = "English",

}

TY - CONF

T1 - BEHAVIOR PRIOR REPRESENTATION LEARNING FOR OFFLINE REINFORCEMENT LEARNING

AU - Zang, Hongyu

AU - Li, Xin

AU - Yu, Jie

AU - Liu, Chen

AU - Islam, Riahsat

AU - des Combes, Rémi Tachet

AU - Laroche, Romain

PY - 2023

Y1 - 2023

N2 - Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at https://github.com/bit1029public/offline_bpr.

AB - Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at https://github.com/bit1029public/offline_bpr.

UR - http://www.scopus.com/inward/record.url?scp=85172120751&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85172120751

T2 - 11th International Conference on Learning Representations, ICLR 2023

Y2 - 1 May 2023 through 5 May 2023

ER -

BEHAVIOR PRIOR REPRESENTATION LEARNING FOR OFFLINE REINFORCEMENT LEARNING

Abstract

Conference

Other files and links

Cite this