STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning

Weipu Zhang; Gang Wang; Jian Sun; Yetian Yuan; Gao Huang

STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning

Weipu Zhang, Gang Wang^*, Jian Sun, Yetian Yuan, Gao Huang

^*Corresponding author for this work

School of Automation

Research output: Contribution to journal › Conference article › peer-review

13 Citations (Scopus)

Abstract

Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.These approaches begin by constructing a parameterized simulation world model of the real environment through self-supervised learning.By leveraging the imagination of the world model, the agent's policy is enhanced without the constraints of sampling from the real environment.The performance of these algorithms heavily relies on the sequence modeling and generation capabilities of the world model.However, constructing a perfectly accurate model of a complex unknown environment is nearly impossible.Discrepancies between the model and reality may cause the agent to pursue virtual goals, resulting in subpar performance in the real environment.Introducing random noise into model-based reinforcement learning has been proven beneficial.In this work, we introduce Stochastic Transformer-based wORld Model (STORM), an efficient world model architecture that combines the strong sequence modeling and generation capabilities of Transformers with the stochastic nature of variational autoencoders.STORM achieves a mean human performance of 126.7% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ lookahead search techniques.Moreover, training an agent with 1.85 hours of real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics card requires only 4.3 hours, showcasing improved efficiency compared to previous methodologies.We release our code at https://github.com/weipu-zhang/STORM.

Original language	English
Journal	Advances in Neural Information Processing Systems
Volume	36
Publication status	Published - 2023
Event	37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: 10 Dec 2023 → 16 Dec 2023

Cite this

@article{335bacf410454ea0978c74d5d7e40dfd,

title = "STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning",

abstract = "Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.These approaches begin by constructing a parameterized simulation world model of the real environment through self-supervised learning.By leveraging the imagination of the world model, the agent's policy is enhanced without the constraints of sampling from the real environment.The performance of these algorithms heavily relies on the sequence modeling and generation capabilities of the world model.However, constructing a perfectly accurate model of a complex unknown environment is nearly impossible.Discrepancies between the model and reality may cause the agent to pursue virtual goals, resulting in subpar performance in the real environment.Introducing random noise into model-based reinforcement learning has been proven beneficial.In this work, we introduce Stochastic Transformer-based wORld Model (STORM), an efficient world model architecture that combines the strong sequence modeling and generation capabilities of Transformers with the stochastic nature of variational autoencoders.STORM achieves a mean human performance of 126.7% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ lookahead search techniques.Moreover, training an agent with 1.85 hours of real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics card requires only 4.3 hours, showcasing improved efficiency compared to previous methodologies.We release our code at https://github.com/weipu-zhang/STORM.",

author = "Weipu Zhang and Gang Wang and Jian Sun and Yetian Yuan and Gao Huang",

note = "Publisher Copyright: {\textcopyright} 2023 Neural information processing systems foundation. All rights reserved.; 37th Conference on Neural Information Processing Systems, NeurIPS 2023 ; Conference date: 10-12-2023 Through 16-12-2023",

year = "2023",

language = "English",

volume = "36",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

publisher = "Neural information processing systems foundation",

}

TY - JOUR

T1 - STORM

T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023

AU - Zhang, Weipu

AU - Wang, Gang

AU - Sun, Jian

AU - Yuan, Yetian

AU - Huang, Gao

PY - 2023

Y1 - 2023

N2 - Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.These approaches begin by constructing a parameterized simulation world model of the real environment through self-supervised learning.By leveraging the imagination of the world model, the agent's policy is enhanced without the constraints of sampling from the real environment.The performance of these algorithms heavily relies on the sequence modeling and generation capabilities of the world model.However, constructing a perfectly accurate model of a complex unknown environment is nearly impossible.Discrepancies between the model and reality may cause the agent to pursue virtual goals, resulting in subpar performance in the real environment.Introducing random noise into model-based reinforcement learning has been proven beneficial.In this work, we introduce Stochastic Transformer-based wORld Model (STORM), an efficient world model architecture that combines the strong sequence modeling and generation capabilities of Transformers with the stochastic nature of variational autoencoders.STORM achieves a mean human performance of 126.7% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ lookahead search techniques.Moreover, training an agent with 1.85 hours of real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics card requires only 4.3 hours, showcasing improved efficiency compared to previous methodologies.We release our code at https://github.com/weipu-zhang/STORM.

AB - Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.These approaches begin by constructing a parameterized simulation world model of the real environment through self-supervised learning.By leveraging the imagination of the world model, the agent's policy is enhanced without the constraints of sampling from the real environment.The performance of these algorithms heavily relies on the sequence modeling and generation capabilities of the world model.However, constructing a perfectly accurate model of a complex unknown environment is nearly impossible.Discrepancies between the model and reality may cause the agent to pursue virtual goals, resulting in subpar performance in the real environment.Introducing random noise into model-based reinforcement learning has been proven beneficial.In this work, we introduce Stochastic Transformer-based wORld Model (STORM), an efficient world model architecture that combines the strong sequence modeling and generation capabilities of Transformers with the stochastic nature of variational autoencoders.STORM achieves a mean human performance of 126.7% on the Atari 100k benchmark, setting a new record among state-of-the-art methods that do not employ lookahead search techniques.Moreover, training an agent with 1.85 hours of real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics card requires only 4.3 hours, showcasing improved efficiency compared to previous methodologies.We release our code at https://github.com/weipu-zhang/STORM.

UR - http://www.scopus.com/inward/record.url?scp=85183367325&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85183367325

SN - 1049-5258

VL - 36

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

Y2 - 10 December 2023 through 16 December 2023

ER -

STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning

Abstract

Other files and links

Fingerprint

Cite this