TY - GEN
T1 - Observation-Time-Action Deep Stacking Strategy
T2 - 2024 International Joint Conference on Neural Networks, IJCNN 2024
AU - Jiang, Keyang
AU - Wang, Qiang
AU - Xu, Yahao
AU - Deng, Hongbin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Reinforcement learning tasks that involve visual input continue to pose a challenge when it comes to partial observability problems. Although prior research has introduced methods such as LSTM, GTrXL, and DNC, each of these approaches possesses its own limitations. To address the problem of partial observability in a more universal context, this paper proposed the Observation-Time-Action deep stacking algorithm. First, observations, actions, and time data were combined into a tuple and then stacked into a longer sequence. Then, convolution and fully connected layers were utilized to extract relevant features from the sequence, which were then fed into the algorithm for processing. We designed a number of experiments with partial observability, corresponding to different typical scenarios in reinforcement learning. The experiment results demonstrated that the proposed method obtained a higher success rate. Moreover, we also investigated the effect of stacking frame length and different reinforcement learning elements on the algorithm. Finally, we conducted a HITL (Hardware-in-the-Loop) experiment to further verify the effectiveness of our algorithm.
AB - Reinforcement learning tasks that involve visual input continue to pose a challenge when it comes to partial observability problems. Although prior research has introduced methods such as LSTM, GTrXL, and DNC, each of these approaches possesses its own limitations. To address the problem of partial observability in a more universal context, this paper proposed the Observation-Time-Action deep stacking algorithm. First, observations, actions, and time data were combined into a tuple and then stacked into a longer sequence. Then, convolution and fully connected layers were utilized to extract relevant features from the sequence, which were then fed into the algorithm for processing. We designed a number of experiments with partial observability, corresponding to different typical scenarios in reinforcement learning. The experiment results demonstrated that the proposed method obtained a higher success rate. Moreover, we also investigated the effect of stacking frame length and different reinforcement learning elements on the algorithm. Finally, we conducted a HITL (Hardware-in-the-Loop) experiment to further verify the effectiveness of our algorithm.
KW - Partial Observability Problems
KW - Reinforcement learning
KW - Visual perception
UR - http://www.scopus.com/inward/record.url?scp=85204994951&partnerID=8YFLogxK
U2 - 10.1109/IJCNN60899.2024.10650736
DO - 10.1109/IJCNN60899.2024.10650736
M3 - Conference contribution
AN - SCOPUS:85204994951
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2024 through 5 July 2024
ER -