An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction

Jing Zhao, Chao Yang*, Weida Wang, Ying Li, Tianqi Qie, Bin Xu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.

Original languageEnglish
Article number103316
JournalAdvanced Engineering Informatics
Volume65
DOIs
Publication statusPublished - May 2025

Keywords

  • Dynamic reward
  • elitist-Q-Learning
  • Mode prediction
  • Path planning
  • VTOL air-ground vehicle

Cite this