Learning Latent Dynamic Robust Representations for World Models

Ruixiang Sun; Hongyu Zang; Xin Li; Riashat Islam

Learning Latent Dynamic Robust Representations for World Models

Ruixiang Sun, Hongyu Zang, Xin Li^*, Riashat Islam

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Conference article › peer-review

Abstract

Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent’s knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill (Gu et al., 2023) with exogenous distractors from the Matterport environment. Our code is avaliable at https://github.com/bit1029public/HRSSM.

Original language	English
Pages (from-to)	47234-47260
Number of pages	27
Journal	Proceedings of Machine Learning Research
Volume	235
Publication status	Published - 2024
Event	41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria Duration: 21 Jul 2024 → 27 Jul 2024

Cite this

@article{ac56c8b4503c4bc982a596ef7f374ae7,

title = "Learning Latent Dynamic Robust Representations for World Models",

abstract = "Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent{\textquoteright}s knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill (Gu et al., 2023) with exogenous distractors from the Matterport environment. Our code is avaliable at https://github.com/bit1029public/HRSSM.",

author = "Ruixiang Sun and Hongyu Zang and Xin Li and Riashat Islam",

year = "2024",

language = "English",

volume = "235",

pages = "47234--47260",

journal = "Proceedings of Machine Learning Research",

issn = "2640-3498",

publisher = "ML Research Press",

}

TY - JOUR

T1 - Learning Latent Dynamic Robust Representations for World Models

AU - Sun, Ruixiang

AU - Zang, Hongyu

AU - Li, Xin

AU - Islam, Riashat

PY - 2024

Y1 - 2024

N2 - Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent’s knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill (Gu et al., 2023) with exogenous distractors from the Matterport environment. Our code is avaliable at https://github.com/bit1029public/HRSSM.

AB - Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent’s knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill (Gu et al., 2023) with exogenous distractors from the Matterport environment. Our code is avaliable at https://github.com/bit1029public/HRSSM.

UR - http://www.scopus.com/inward/record.url?scp=85203812670&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85203812670

SN - 2640-3498

VL - 235

SP - 47234

EP - 47260

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 41st International Conference on Machine Learning, ICML 2024

Y2 - 21 July 2024 through 27 July 2024

ER -

Learning Latent Dynamic Robust Representations for World Models

Abstract

Other files and links

Fingerprint

Cite this