TY - GEN
T1 - On improving the learning of long-term historical information for tasks with partial observability
AU - Wang, Xinwen
AU - Li, Xin
AU - Lai, Linjing
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.
AB - Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.
KW - LSTM
KW - POMDPs
KW - Reinforcement Learning
KW - Time Gate
UR - http://www.scopus.com/inward/record.url?scp=85092060346&partnerID=8YFLogxK
U2 - 10.1109/DSC50466.2020.00042
DO - 10.1109/DSC50466.2020.00042
M3 - Conference contribution
AN - SCOPUS:85092060346
T3 - Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020
SP - 232
EP - 237
BT - Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Data Science in Cyberspace, DSC 2020
Y2 - 27 July 2020 through 29 July 2020
ER -