OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games

Ruilin Jiang; Yanlong Zhai; Yan Zheng; You Li; Yanglin Liu

doi:10.1007/978-981-97-5581-3_36

OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games

Ruilin Jiang, Yanlong Zhai^*, Yan Zheng, You Li, Yanglin Liu

^*此作品的通讯作者

网络空间安全学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

The multi-agent real-time strategy game problem is a classic problem in the field of reinforcement learning, and solving such a problem is of high instructive significance to the economic and military fields in real society. In recent years, researchers from many countries have made breakthroughs in the related problems, but most related technologies target specific environments or require high computing power platforms. This leads to an exponential increase in the time and resources consumed in training models when the complexity and scope of a task increases. In this paper, we proposed OB-HPPO, an option and intrinsic curiosity based hierarchical reinforcement learning framework to address these challenges. Our approach hierarchically decomposes a huge action space into several self-explainable options, simplifying atomic action decisions into a series of action decisions. OB-HPPO also introduces an intrinsic curiosity module (ICM) based on the Proximal Policy Optimization (PPO) algorithm to improve the efficiency of model training and exploration. Experimental results show that OB-HPPO takes less training time and accumulates more rewards than non-hierarchical models. We also test OB-HPPO against some representative AI models of the μRTS environment, and OB-HPPO's winning rate is significantly improved.

源语言	英语
主期刊名	Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings
编辑	De-Shuang Huang, Yijie Pan, Xiankun Zhang
出版商	Springer Science and Business Media Deutschland GmbH
页	443-454
页数	12
ISBN（印刷版）	9789819755806
DOI	https://doi.org/10.1007/978-981-97-5581-3_36
出版状态	已出版 - 2024
活动	20th International Conference on Intelligent Computing, ICIC 2024 - Tianjin, 中国期限: 5 8月 2024 → 8 8月 2024

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	14863 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	20th International Conference on Intelligent Computing, ICIC 2024
国家/地区	中国
市	Tianjin
时期	5/08/24 → 8/08/24

访问文件

10.1007/978-981-97-5581-3_36

其它文件与链接

链接到 Scopus 的出版物

引用此

Jiang, R., Zhai, Y., Zheng, Y., Li, Y., & Liu, Y. (2024). OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games. 在 D.-S. Huang, Y. Pan, & X. Zhang (编辑), Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings (页码 443-454). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14863 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-97-5581-3_36

Jiang, Ruilin ; Zhai, Yanlong ; Zheng, Yan 等. / OB-HPPO : An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games. Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings. 编辑 / De-Shuang Huang ; Yijie Pan ; Xiankun Zhang. Springer Science and Business Media Deutschland GmbH, 2024. 页码 443-454 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{6062baaffdf04f339f8684ef3f2618ee,

title = "OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games",

abstract = "The multi-agent real-time strategy game problem is a classic problem in the field of reinforcement learning, and solving such a problem is of high instructive significance to the economic and military fields in real society. In recent years, researchers from many countries have made breakthroughs in the related problems, but most related technologies target specific environments or require high computing power platforms. This leads to an exponential increase in the time and resources consumed in training models when the complexity and scope of a task increases. In this paper, we proposed OB-HPPO, an option and intrinsic curiosity based hierarchical reinforcement learning framework to address these challenges. Our approach hierarchically decomposes a huge action space into several self-explainable options, simplifying atomic action decisions into a series of action decisions. OB-HPPO also introduces an intrinsic curiosity module (ICM) based on the Proximal Policy Optimization (PPO) algorithm to improve the efficiency of model training and exploration. Experimental results show that OB-HPPO takes less training time and accumulates more rewards than non-hierarchical models. We also test OB-HPPO against some representative AI models of the μRTS environment, and OB-HPPO's winning rate is significantly improved.",

keywords = "Hierarchical reinforcement learning, Modular hierarchical command, Option, Proximal policy optimization, Real-time strategy game",

author = "Ruilin Jiang and Yanlong Zhai and Yan Zheng and You Li and Yanglin Liu",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.; 20th International Conference on Intelligent Computing, ICIC 2024 ; Conference date: 05-08-2024 Through 08-08-2024",

year = "2024",

doi = "10.1007/978-981-97-5581-3_36",

language = "English",

isbn = "9789819755806",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "443--454",

editor = "De-Shuang Huang and Yijie Pan and Xiankun Zhang",

booktitle = "Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings",

address = "Germany",

}

Jiang, R, Zhai, Y, Zheng, Y, Li, Y & Liu, Y 2024, OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games. 在 D-S Huang, Y Pan & X Zhang (编辑), Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 14863 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 443-454, 20th International Conference on Intelligent Computing, ICIC 2024, Tianjin, 中国, 5/08/24. https://doi.org/10.1007/978-981-97-5581-3_36

OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games. / Jiang, Ruilin; Zhai, Yanlong; Zheng, Yan 等.
Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings. 编辑 / De-Shuang Huang; Yijie Pan; Xiankun Zhang. Springer Science and Business Media Deutschland GmbH, 2024. 页码 443-454 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14863 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - OB-HPPO

T2 - 20th International Conference on Intelligent Computing, ICIC 2024

AU - Jiang, Ruilin

AU - Zhai, Yanlong

AU - Zheng, Yan

AU - Li, You

AU - Liu, Yanglin

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

PY - 2024

Y1 - 2024

N2 - The multi-agent real-time strategy game problem is a classic problem in the field of reinforcement learning, and solving such a problem is of high instructive significance to the economic and military fields in real society. In recent years, researchers from many countries have made breakthroughs in the related problems, but most related technologies target specific environments or require high computing power platforms. This leads to an exponential increase in the time and resources consumed in training models when the complexity and scope of a task increases. In this paper, we proposed OB-HPPO, an option and intrinsic curiosity based hierarchical reinforcement learning framework to address these challenges. Our approach hierarchically decomposes a huge action space into several self-explainable options, simplifying atomic action decisions into a series of action decisions. OB-HPPO also introduces an intrinsic curiosity module (ICM) based on the Proximal Policy Optimization (PPO) algorithm to improve the efficiency of model training and exploration. Experimental results show that OB-HPPO takes less training time and accumulates more rewards than non-hierarchical models. We also test OB-HPPO against some representative AI models of the μRTS environment, and OB-HPPO's winning rate is significantly improved.

AB - The multi-agent real-time strategy game problem is a classic problem in the field of reinforcement learning, and solving such a problem is of high instructive significance to the economic and military fields in real society. In recent years, researchers from many countries have made breakthroughs in the related problems, but most related technologies target specific environments or require high computing power platforms. This leads to an exponential increase in the time and resources consumed in training models when the complexity and scope of a task increases. In this paper, we proposed OB-HPPO, an option and intrinsic curiosity based hierarchical reinforcement learning framework to address these challenges. Our approach hierarchically decomposes a huge action space into several self-explainable options, simplifying atomic action decisions into a series of action decisions. OB-HPPO also introduces an intrinsic curiosity module (ICM) based on the Proximal Policy Optimization (PPO) algorithm to improve the efficiency of model training and exploration. Experimental results show that OB-HPPO takes less training time and accumulates more rewards than non-hierarchical models. We also test OB-HPPO against some representative AI models of the μRTS environment, and OB-HPPO's winning rate is significantly improved.

KW - Hierarchical reinforcement learning

KW - Modular hierarchical command

KW - Option

KW - Proximal policy optimization

KW - Real-time strategy game

UR - http://www.scopus.com/inward/record.url?scp=85201124818&partnerID=8YFLogxK

U2 - 10.1007/978-981-97-5581-3_36

DO - 10.1007/978-981-97-5581-3_36

M3 - Conference contribution

AN - SCOPUS:85201124818

SN - 9789819755806

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 443

EP - 454

BT - Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings

A2 - Huang, De-Shuang

A2 - Pan, Yijie

A2 - Zhang, Xiankun

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 5 August 2024 through 8 August 2024

ER -

Jiang R, Zhai Y, Zheng Y, Li Y, Liu Y. OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games. 在 Huang DS, Pan Y, Zhang X, 编辑, Advanced Intelligent Computing Technology and Applications - 20th International Conference, ICIC 2024, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. 页码 443-454. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-97-5581-3_36

OB-HPPO: An Option and Intrinsic Curiosity Based Hierarchical Reinforcement Learning Approach for Real-Time Strategy Games

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此