TY - GEN
T1 - OB-HPPO
T2 - 20th International Conference on Intelligent Computing, ICIC 2024
AU - Jiang, Ruilin
AU - Zhai, Yanlong
AU - Zheng, Yan
AU - Li, You
AU - Liu, Yanglin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - The multi-agent real-time strategy game problem is a classic problem in the field of reinforcement learning, and solving such a problem is of high instructive significance to the economic and military fields in real society. In recent years, researchers from many countries have made breakthroughs in the related problems, but most related technologies target specific environments or require high computing power platforms. This leads to an exponential increase in the time and resources consumed in training models when the complexity and scope of a task increases. In this paper, we proposed OB-HPPO, an option and intrinsic curiosity based hierarchical reinforcement learning framework to address these challenges. Our approach hierarchically decomposes a huge action space into several self-explainable options, simplifying atomic action decisions into a series of action decisions. OB-HPPO also introduces an intrinsic curiosity module (ICM) based on the Proximal Policy Optimization (PPO) algorithm to improve the efficiency of model training and exploration. Experimental results show that OB-HPPO takes less training time and accumulates more rewards than non-hierarchical models. We also test OB-HPPO against some representative AI models of the μRTS environment, and OB-HPPO's winning rate is significantly improved.
AB - The multi-agent real-time strategy game problem is a classic problem in the field of reinforcement learning, and solving such a problem is of high instructive significance to the economic and military fields in real society. In recent years, researchers from many countries have made breakthroughs in the related problems, but most related technologies target specific environments or require high computing power platforms. This leads to an exponential increase in the time and resources consumed in training models when the complexity and scope of a task increases. In this paper, we proposed OB-HPPO, an option and intrinsic curiosity based hierarchical reinforcement learning framework to address these challenges. Our approach hierarchically decomposes a huge action space into several self-explainable options, simplifying atomic action decisions into a series of action decisions. OB-HPPO also introduces an intrinsic curiosity module (ICM) based on the Proximal Policy Optimization (PPO) algorithm to improve the efficiency of model training and exploration. Experimental results show that OB-HPPO takes less training time and accumulates more rewards than non-hierarchical models. We also test OB-HPPO against some representative AI models of the μRTS environment, and OB-HPPO's winning rate is significantly improved.
KW - Hierarchical reinforcement learning
KW - Modular hierarchical command
KW - Option
KW - Proximal policy optimization
KW - Real-time strategy game
UR - http://www.scopus.com/inward/record.url?scp=85201124818&partnerID=8YFLogxK
U2 - 10.1007/978-981-97-5581-3_36
DO - 10.1007/978-981-97-5581-3_36
M3 - Conference contribution
AN - SCOPUS:85201124818
SN - 9789819755806
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 443
EP - 454
BT - Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings
A2 - Huang, De-Shuang
A2 - Pan, Yijie
A2 - Zhang, Xiankun
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 5 August 2024 through 8 August 2024
ER -