TY - JOUR
T1 - An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction
AU - Zhao, Jing
AU - Yang, Chao
AU - Wang, Weida
AU - Li, Ying
AU - Qie, Tianqi
AU - Xu, Bin
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/5
Y1 - 2025/5
N2 - Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.
AB - Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.
KW - Dynamic reward
KW - elitist-Q-Learning
KW - Mode prediction
KW - Path planning
KW - VTOL air-ground vehicle
UR - http://www.scopus.com/inward/record.url?scp=105001838950&partnerID=8YFLogxK
U2 - 10.1016/j.aei.2025.103316
DO - 10.1016/j.aei.2025.103316
M3 - Article
AN - SCOPUS:105001838950
SN - 1474-0346
VL - 65
JO - Advanced Engineering Informatics
JF - Advanced Engineering Informatics
M1 - 103316
ER -