TY - GEN
T1 - Self-play Decision-making Method of Deep Reinforcement Learning Guided by Behavior Tree under Complex Environment
AU - Xiong, Xiaochen
AU - Wang, Shuai
AU - Wang, Bo
N1 - Publisher Copyright:
© 2024 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2024
Y1 - 2024
N2 - With advances in artificial intelligence, military simulation has evolved from a human-to-human exercise to autonomous self-improvement through self-play of reinforcement learning. Unlike air and sea engagements, the complexity of terrain in land battles cannot be ignored. On the 2D map of the real world, we introduce elevation data through grid map to construct basic scenes and apply fuzzy theory to clarify the nuances of different terrain. Based on the latest unmanned combat vehicle data, the land combat agent model is designed, and the depth deterministic strategy gradient (DDPG) algorithm and near end strategy optimization (PPO) algorithm are adopted. Through self-play, the strategy of the vehicle is optimized so that it can effectively adapt to various battlefield scenarios. This method not only enhances the realism of simulation by combining key terrain features, but also significantly improves the strategy capability of autonomous agents. By constantly playing with themselves to perfect their tactics, these agents can better face the unpredictable dynamics of land warfare. Our results show that by incorporating training with DDPG and PPO algorithms guided by behavior trees, the ability of agents to navigate and engage in complex terrain is significantly improved, demonstrating the potential of AI-driven simulation to shape future military strategy and training.
AB - With advances in artificial intelligence, military simulation has evolved from a human-to-human exercise to autonomous self-improvement through self-play of reinforcement learning. Unlike air and sea engagements, the complexity of terrain in land battles cannot be ignored. On the 2D map of the real world, we introduce elevation data through grid map to construct basic scenes and apply fuzzy theory to clarify the nuances of different terrain. Based on the latest unmanned combat vehicle data, the land combat agent model is designed, and the depth deterministic strategy gradient (DDPG) algorithm and near end strategy optimization (PPO) algorithm are adopted. Through self-play, the strategy of the vehicle is optimized so that it can effectively adapt to various battlefield scenarios. This method not only enhances the realism of simulation by combining key terrain features, but also significantly improves the strategy capability of autonomous agents. By constantly playing with themselves to perfect their tactics, these agents can better face the unpredictable dynamics of land warfare. Our results show that by incorporating training with DDPG and PPO algorithms guided by behavior trees, the ability of agents to navigate and engage in complex terrain is significantly improved, demonstrating the potential of AI-driven simulation to shape future military strategy and training.
KW - behavior trees
KW - deep reinforcement learning
KW - fuzzy theory
KW - self-play
UR - http://www.scopus.com/inward/record.url?scp=85205468937&partnerID=8YFLogxK
U2 - 10.23919/CCC63176.2024.10662399
DO - 10.23919/CCC63176.2024.10662399
M3 - Conference contribution
AN - SCOPUS:85205468937
T3 - Chinese Control Conference, CCC
SP - 3988
EP - 3993
BT - Proceedings of the 43rd Chinese Control Conference, CCC 2024
A2 - Na, Jing
A2 - Sun, Jian
PB - IEEE Computer Society
T2 - 43rd Chinese Control Conference, CCC 2024
Y2 - 28 July 2024 through 31 July 2024
ER -