On improving the learning of long-term historical information for tasks with partial observability

Xinwen Wang, Xin Li, Linjing Lai*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.

源语言英语
主期刊名Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020
出版商Institute of Electrical and Electronics Engineers Inc.
232-237
页数6
ISBN(电子版)9781728195582
DOI
出版状态已出版 - 7月 2020
活动5th IEEE International Conference on Data Science in Cyberspace, DSC 2020 - Hong Kong, 中国
期限: 27 7月 202029 7月 2020

出版系列

姓名Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020

会议

会议5th IEEE International Conference on Data Science in Cyberspace, DSC 2020
国家/地区中国
Hong Kong
时期27/07/2029/07/20

指纹

探究 'On improving the learning of long-term historical information for tasks with partial observability' 的科研主题。它们共同构成独一无二的指纹。

引用此