TY - GEN
T1 - Autonomous Vehicles Roundup Strategy by Reinforcement Learning with Prediction Trajectory
AU - Ni, Jiayang
AU - Ma, Rubing
AU - Zhong, Hua
AU - Wang, Bo
N1 - Publisher Copyright:
© 2022 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2022
Y1 - 2022
N2 - Autonomous vehicles are increasingly applied on many situations, but their autonomous decision-making ability needs to be improved. Multi-Agent Deep Deterministic Policy Gradient(MADDPG) adopts the method of centralized evaluation and decentralized execution, so that the autonomous vehicle can obtain the whole-field status information and make decisions through the companion information. In the process of autonomous vehicle training, we introduce artificial potential field, action guidance and other methods to alleviate the problem of sparse rewards. At the same time, we add a repulsion function to consider the relationship between team vehicles. Extended Kalman Filter(EKF) is also applied to predict the autonomous vehicle trajectory, changing the training network state input information. At the same time, secondary correction of the predicted autonomous vehicle trajectory is made to change the prediction range with the training time, and improve the training convergence speed while the speed of opposite agents increases. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on trajectory prediction and artificial potential field is significantly improved, and it also has strong adaptability to various task scenarios.
AB - Autonomous vehicles are increasingly applied on many situations, but their autonomous decision-making ability needs to be improved. Multi-Agent Deep Deterministic Policy Gradient(MADDPG) adopts the method of centralized evaluation and decentralized execution, so that the autonomous vehicle can obtain the whole-field status information and make decisions through the companion information. In the process of autonomous vehicle training, we introduce artificial potential field, action guidance and other methods to alleviate the problem of sparse rewards. At the same time, we add a repulsion function to consider the relationship between team vehicles. Extended Kalman Filter(EKF) is also applied to predict the autonomous vehicle trajectory, changing the training network state input information. At the same time, secondary correction of the predicted autonomous vehicle trajectory is made to change the prediction range with the training time, and improve the training convergence speed while the speed of opposite agents increases. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on trajectory prediction and artificial potential field is significantly improved, and it also has strong adaptability to various task scenarios.
KW - artificial potential field
KW - autonomous vehicle roundup
KW - reinforcement learning
KW - trajectory prediction
UR - http://www.scopus.com/inward/record.url?scp=85140450095&partnerID=8YFLogxK
U2 - 10.23919/CCC55666.2022.9902245
DO - 10.23919/CCC55666.2022.9902245
M3 - Conference contribution
AN - SCOPUS:85140450095
T3 - Chinese Control Conference, CCC
SP - 3370
EP - 3375
BT - Proceedings of the 41st Chinese Control Conference, CCC 2022
A2 - Li, Zhijun
A2 - Sun, Jian
PB - IEEE Computer Society
T2 - 41st Chinese Control Conference, CCC 2022
Y2 - 25 July 2022 through 27 July 2022
ER -