On improving the learning of long-term historical information for tasks with partial observability

Xinwen Wang, Xin Li, Linjing Lai*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages232-237
Number of pages6
ISBN (Electronic)9781728195582
DOIs
Publication statusPublished - Jul 2020
Event5th IEEE International Conference on Data Science in Cyberspace, DSC 2020 - Hong Kong, China
Duration: 27 Jul 202029 Jul 2020

Publication series

NameProceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020

Conference

Conference5th IEEE International Conference on Data Science in Cyberspace, DSC 2020
Country/TerritoryChina
CityHong Kong
Period27/07/2029/07/20

Keywords

  • LSTM
  • POMDPs
  • Reinforcement Learning
  • Time Gate

Fingerprint

Dive into the research topics of 'On improving the learning of long-term historical information for tasks with partial observability'. Together they form a unique fingerprint.

Cite this