On improving the learning of long-term historical information for tasks with partial observability

Xinwen Wang; Xin Li; Linjing Lai

doi:10.1109/DSC50466.2020.00042

On improving the learning of long-term historical information for tasks with partial observability

Xinwen Wang, Xin Li, Linjing Lai^*

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.

源语言	英语
主期刊名	Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020
出版商	Institute of Electrical and Electronics Engineers Inc.
页	232-237
页数	6
ISBN（电子版）	9781728195582
DOI	https://doi.org/10.1109/DSC50466.2020.00042
出版状态	已出版 - 7月 2020
活动	5th IEEE International Conference on Data Science in Cyberspace, DSC 2020 - Hong Kong, 中国期限: 27 7月 2020 → 29 7月 2020

出版系列

姓名	Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020

会议

会议	5th IEEE International Conference on Data Science in Cyberspace, DSC 2020
国家/地区	中国
市	Hong Kong
时期	27/07/20 → 29/07/20

访问文件

10.1109/DSC50466.2020.00042

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, X., Li, X., & Lai, L. (2020). On improving the learning of long-term historical information for tasks with partial observability. 在 Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020 (页码 232-237). 文章 9172883 (Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DSC50466.2020.00042

Wang, Xinwen ; Li, Xin ; Lai, Linjing. / On improving the learning of long-term historical information for tasks with partial observability. Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020. Institute of Electrical and Electronics Engineers Inc., 2020. 页码 232-237 (Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020).

@inproceedings{ad41180098cf4344b4ad8298b4a31c03,

title = "On improving the learning of long-term historical information for tasks with partial observability",

abstract = "Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.",

keywords = "LSTM, POMDPs, Reinforcement Learning, Time Gate",

author = "Xinwen Wang and Xin Li and Linjing Lai",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 5th IEEE International Conference on Data Science in Cyberspace, DSC 2020 ; Conference date: 27-07-2020 Through 29-07-2020",

year = "2020",

month = jul,

doi = "10.1109/DSC50466.2020.00042",

language = "English",

series = "Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "232--237",

booktitle = "Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020",

address = "United States",

}

Wang, X, Li, X & Lai, L 2020, On improving the learning of long-term historical information for tasks with partial observability. 在 Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020., 9172883, Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020, Institute of Electrical and Electronics Engineers Inc., 页码 232-237, 5th IEEE International Conference on Data Science in Cyberspace, DSC 2020, Hong Kong, 中国, 27/07/20. https://doi.org/10.1109/DSC50466.2020.00042

On improving the learning of long-term historical information for tasks with partial observability. / Wang, Xinwen; Li, Xin; Lai, Linjing.
Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020. Institute of Electrical and Electronics Engineers Inc., 2020. 页码 232-237 9172883 (Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - On improving the learning of long-term historical information for tasks with partial observability

AU - Wang, Xinwen

AU - Li, Xin

AU - Lai, Linjing

PY - 2020/7

Y1 - 2020/7

N2 - Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.

AB - Reinforcement learning (RL) has been recognized as the powerful tool to handle many real-work tasks of decision making, data mining and, information retrieval. Many well-developed RL algorithms have been developed, however tasks involved with partially observable environment, e.g, POMDPs (Partially Observable Markov Decision Processes) are still very challenging. Recent attempts to address this issue is to memorize the long-term historical information by using deep neural networks. And the common strategy is to leverage the recurrent networks, e.g., Long Short-Term Memory(LSTM), to retain/encode the historical information to estimate the true state of environments, given the partial observability. However, when confronted with rather long history dependent problems and irregular data sampling, the conventional LSTM is ill-suited for the problem and difficult to be trained due to the well-known gradient vanishing and the inadequacy of capturing long-term history. In this paper, we propose to utilize Phased LSTM to solve the POMDP tasks, which introduces an additional time gate to periodically update the memory cell, helping the neural framework to 1) maintain the information of the long-term, 2) and propagate the gradient better to facilitate the training of reinforcement learning model with recurrent structure. To further adapt to reinforcement learning and boost the performance, we also propose a Self-Phased LSTM with incorporating a periodic gate, which is able to generate a dynamic periodic gate to adjust automatically for more tasks, especially the notorious ones with sparse rewards. Our experimental results verify the effectiveness of leveraging on such Phased LSTM and Self-Phased LSTM for POMDP tasks.

KW - LSTM

KW - POMDPs

KW - Reinforcement Learning

KW - Time Gate

UR - http://www.scopus.com/inward/record.url?scp=85092060346&partnerID=8YFLogxK

U2 - 10.1109/DSC50466.2020.00042

DO - 10.1109/DSC50466.2020.00042

M3 - Conference contribution

AN - SCOPUS:85092060346

T3 - Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020

SP - 232

EP - 237

BT - Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 5th IEEE International Conference on Data Science in Cyberspace, DSC 2020

Y2 - 27 July 2020 through 29 July 2020

ER -

Wang X, Li X, Lai L. On improving the learning of long-term historical information for tasks with partial observability. 在 Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020. Institute of Electrical and Electronics Engineers Inc. 2020. 页码 232-237. 9172883. (Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020). doi: 10.1109/DSC50466.2020.00042

On improving the learning of long-term historical information for tasks with partial observability

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此