Reinforcement-learning-based path planning for UAVs in intensive obstacle environment

Miao Guo; Teng Long; Hui Li; Jingliang Sun

doi:10.1109/CAC53003.2021.9727746

Reinforcement-learning-based path planning for UAVs in intensive obstacle environment

Miao Guo, Teng Long, Hui Li, Jingliang Sun

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Citations (Scopus)

Abstract

In intensive obstacle environment, the available flying space is narrow, which makes it difficult to generate feasible path for UAVs within limited runtime. In this paper, a Q-learning-based planning algorithm is presented to improve the efficiency of single UAV path planning in intensive obstacle environment. By constructing the space-action state offline learning planning architecture, the proposed method realizes the rapid path planning of UAV, and solves the high time-consuming problem of reinforcement learning online path planning. Considering the time-consuming problem of Q-table re-training, a probabilistic local update mechanism is proposed by updating the Q-value of the states to reduce the high time-consuming of Q-table re-raining and realize the rapid update of Q-table. The probability of Q-value updating is up to the distance to the new obstacle. The closer the state is to the new obstacle, the higher its probability of re-training. Therefore, the flight trajectory can be quickly re-planned when the environment changes. Simulation results show that the proposed Q-learning-based planning algorithm can generate path for UAV from random start position and avoid the obstacles. Compared with the classical A* algorithm, the path planning time based on the trained Q table can be reduced from second to millisecond, which significantly improves the efficiency of path planning.

Original language	English
Title of host publication	Proceeding - 2021 China Automation Congress, CAC 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6451-6455
Number of pages	5
ISBN (Electronic)	9781665426473
DOIs	https://doi.org/10.1109/CAC53003.2021.9727746
Publication status	Published - 2021
Event	2021 China Automation Congress, CAC 2021 - Beijing, China Duration: 22 Oct 2021 → 24 Oct 2021

Publication series

Name	Proceeding - 2021 China Automation Congress, CAC 2021

Conference

Conference	2021 China Automation Congress, CAC 2021
Country/Territory	China
City	Beijing
Period	22/10/21 → 24/10/21

Keywords

Q-learning
UAV
offline training
path planning
probabilistic local update mechanism
reinforcement learning

Access to Document

10.1109/CAC53003.2021.9727746

Cite this

@inproceedings{7d2204959d6341e3abdd58a5108f3aeb,

title = "Reinforcement-learning-based path planning for UAVs in intensive obstacle environment",

abstract = "In intensive obstacle environment, the available flying space is narrow, which makes it difficult to generate feasible path for UAVs within limited runtime. In this paper, a Q-learning-based planning algorithm is presented to improve the efficiency of single UAV path planning in intensive obstacle environment. By constructing the space-action state offline learning planning architecture, the proposed method realizes the rapid path planning of UAV, and solves the high time-consuming problem of reinforcement learning online path planning. Considering the time-consuming problem of Q-table re-training, a probabilistic local update mechanism is proposed by updating the Q-value of the states to reduce the high time-consuming of Q-table re-raining and realize the rapid update of Q-table. The probability of Q-value updating is up to the distance to the new obstacle. The closer the state is to the new obstacle, the higher its probability of re-training. Therefore, the flight trajectory can be quickly re-planned when the environment changes. Simulation results show that the proposed Q-learning-based planning algorithm can generate path for UAV from random start position and avoid the obstacles. Compared with the classical A* algorithm, the path planning time based on the trained Q table can be reduced from second to millisecond, which significantly improves the efficiency of path planning.",

keywords = "Q-learning, UAV, offline training, path planning, probabilistic local update mechanism, reinforcement learning",

author = "Miao Guo and Teng Long and Hui Li and Jingliang Sun",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE; 2021 China Automation Congress, CAC 2021 ; Conference date: 22-10-2021 Through 24-10-2021",

year = "2021",

doi = "10.1109/CAC53003.2021.9727746",

language = "English",

series = "Proceeding - 2021 China Automation Congress, CAC 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6451--6455",

booktitle = "Proceeding - 2021 China Automation Congress, CAC 2021",

address = "United States",

}

Guo, M, Long, T, Li, H & Sun, J 2021, Reinforcement-learning-based path planning for UAVs in intensive obstacle environment. in Proceeding - 2021 China Automation Congress, CAC 2021. Proceeding - 2021 China Automation Congress, CAC 2021, Institute of Electrical and Electronics Engineers Inc., pp. 6451-6455, 2021 China Automation Congress, CAC 2021, Beijing, China, 22/10/21. https://doi.org/10.1109/CAC53003.2021.9727746

Reinforcement-learning-based path planning for UAVs in intensive obstacle environment. / Guo, Miao; Long, Teng; Li, Hui et al.
Proceeding - 2021 China Automation Congress, CAC 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 6451-6455 (Proceeding - 2021 China Automation Congress, CAC 2021).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Reinforcement-learning-based path planning for UAVs in intensive obstacle environment

AU - Guo, Miao

AU - Long, Teng

AU - Li, Hui

AU - Sun, Jingliang

PY - 2021

Y1 - 2021

N2 - In intensive obstacle environment, the available flying space is narrow, which makes it difficult to generate feasible path for UAVs within limited runtime. In this paper, a Q-learning-based planning algorithm is presented to improve the efficiency of single UAV path planning in intensive obstacle environment. By constructing the space-action state offline learning planning architecture, the proposed method realizes the rapid path planning of UAV, and solves the high time-consuming problem of reinforcement learning online path planning. Considering the time-consuming problem of Q-table re-training, a probabilistic local update mechanism is proposed by updating the Q-value of the states to reduce the high time-consuming of Q-table re-raining and realize the rapid update of Q-table. The probability of Q-value updating is up to the distance to the new obstacle. The closer the state is to the new obstacle, the higher its probability of re-training. Therefore, the flight trajectory can be quickly re-planned when the environment changes. Simulation results show that the proposed Q-learning-based planning algorithm can generate path for UAV from random start position and avoid the obstacles. Compared with the classical A* algorithm, the path planning time based on the trained Q table can be reduced from second to millisecond, which significantly improves the efficiency of path planning.

AB - In intensive obstacle environment, the available flying space is narrow, which makes it difficult to generate feasible path for UAVs within limited runtime. In this paper, a Q-learning-based planning algorithm is presented to improve the efficiency of single UAV path planning in intensive obstacle environment. By constructing the space-action state offline learning planning architecture, the proposed method realizes the rapid path planning of UAV, and solves the high time-consuming problem of reinforcement learning online path planning. Considering the time-consuming problem of Q-table re-training, a probabilistic local update mechanism is proposed by updating the Q-value of the states to reduce the high time-consuming of Q-table re-raining and realize the rapid update of Q-table. The probability of Q-value updating is up to the distance to the new obstacle. The closer the state is to the new obstacle, the higher its probability of re-training. Therefore, the flight trajectory can be quickly re-planned when the environment changes. Simulation results show that the proposed Q-learning-based planning algorithm can generate path for UAV from random start position and avoid the obstacles. Compared with the classical A* algorithm, the path planning time based on the trained Q table can be reduced from second to millisecond, which significantly improves the efficiency of path planning.

KW - Q-learning

KW - UAV

KW - offline training

KW - path planning

KW - probabilistic local update mechanism

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85128041443&partnerID=8YFLogxK

U2 - 10.1109/CAC53003.2021.9727746

DO - 10.1109/CAC53003.2021.9727746

M3 - Conference contribution

AN - SCOPUS:85128041443

T3 - Proceeding - 2021 China Automation Congress, CAC 2021

SP - 6451

EP - 6455

BT - Proceeding - 2021 China Automation Congress, CAC 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2021 China Automation Congress, CAC 2021

Y2 - 22 October 2021 through 24 October 2021

ER -

Reinforcement-learning-based path planning for UAVs in intensive obstacle environment

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this