TY - GEN
T1 - ROP-DARL
T2 - 28th International Conference on Intelligent Transportation Systems, ITSC 2025
AU - Li, Yaqing
AU - Li, Xinke
AU - Fu, Mengyin
AU - Yang, Yi
AU - Zhang, Ting
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Reinforcement learning algorithms are widely applied to autonomous driving decision-making in complex interactive environments. However, ensuring safety remains a significant challenge. Although safe reinforcement learning methods have been proposed, they still struggle to balance between security and efficiency when making decisions. To address these challenges, this work proposes a Risk-aware Optimistic-Pessimistic Dual-Actor Reinforcement Learning (ROP-DARL) approach, which enhances the safety performance of the model from three aspects. First, we introduce a trajectory prediction model for scenario understanding and rank the predicted trajectories based on the risk field theory. Second, hybrid strategies are generated by the proposed dual policies to dynamically balance the efficiency and safety of decision-making. Specifically, the optimistic actor fully utilizes prediction information to learn efficient strategies, while the pessimistic actor only considers high-risk predictions to generate cautious strategies. Finally, we employ the action mask method and explore its functioning pattern regarding the model's safety performance, which further verifies the robustness of the proposed model. Experiments show that in three interactive traffic scenarios, the proposed model achieves higher success rates and better safety guarantees even with diminished action masking.
AB - Reinforcement learning algorithms are widely applied to autonomous driving decision-making in complex interactive environments. However, ensuring safety remains a significant challenge. Although safe reinforcement learning methods have been proposed, they still struggle to balance between security and efficiency when making decisions. To address these challenges, this work proposes a Risk-aware Optimistic-Pessimistic Dual-Actor Reinforcement Learning (ROP-DARL) approach, which enhances the safety performance of the model from three aspects. First, we introduce a trajectory prediction model for scenario understanding and rank the predicted trajectories based on the risk field theory. Second, hybrid strategies are generated by the proposed dual policies to dynamically balance the efficiency and safety of decision-making. Specifically, the optimistic actor fully utilizes prediction information to learn efficient strategies, while the pessimistic actor only considers high-risk predictions to generate cautious strategies. Finally, we employ the action mask method and explore its functioning pattern regarding the model's safety performance, which further verifies the robustness of the proposed model. Experiments show that in three interactive traffic scenarios, the proposed model achieves higher success rates and better safety guarantees even with diminished action masking.
KW - interactive decisionmaking
KW - reinforcement learning
KW - safety mask
UR - https://www.scopus.com/pages/publications/105036995135
U2 - 10.1109/ITSC60802.2025.11423265
DO - 10.1109/ITSC60802.2025.11423265
M3 - Conference contribution
AN - SCOPUS:105036995135
T3 - IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC
SP - 1848
EP - 1855
BT - IEEE Intelligent Transportation Systems Conference, ITSC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 November 2025 through 21 November 2025
ER -