TY - JOUR
T1 - Double Deep Q-Learning Network for Package Pickup and Delivery Route Prediction
AU - Gubena, Metages Molla
AU - Zhang, Yuli
AU - Ma, Ziqian
AU - Ma, Xiaoyu
AU - Yang, Yucai
N1 - Publisher Copyright:
© 1975-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - Package pickup and delivery services are essential for real-time logistics, where riders must navigate operational constraints (e.g., delivery deadlines, pickup-delivery precedence), personalized routing preferences, and dynamic demand fluctuations. Existing approaches exhibit critical limitations: combinatorial optimization (CO) often ignores rider preferences; deep learning (DL) struggles to enforce constraints; and deep reinforcement learning (DRL) faces sparse rewards and training instability. To address these challenges, we propose a novel hybrid framework—the double deep Q-learning network for package pickup and delivery route prediction (DDQN-PPDRP)—that integrates double deep Q-networks with constraint-aware optimization. Our method introduces three novel components: (1) an ensemble network combining a traveling salesman model with time windows for constraint satisfaction and an enhanced DeepRoute transformer (i-DeepRoute) with precedence-aware masking to model rider-specific routing behavior; (2) dual-reward functions incorporating complete match rate (CMR) for strict route alignment and edit distance (ED) for incremental feedback, alleviating reward sparsity; and (3) adaptive prioritized experience replay (APER) with double-buffer strategy to stabilize training. Evaluated on real-world data from Dada Express, DDQN-PPDRP achieves significant improvements over state-of-the-art baselines: a 29.14% reduction in location square deviation, 48.33% lower ED, 19.20% higher CMR, and 6.67% improved rank correlation. It also reduces network parameters by 85% compared to pure DL methods and cuts training time by 57.7% through APER. These results demonstrate that DDQN-PPDRP effectively balances operational constraints, rider preferences, and computational efficiency, harmonizing CO’s interpretability with DRL’s adaptability to advance scalable, constraint-compliant route prediction in dynamic logistics environments.
AB - Package pickup and delivery services are essential for real-time logistics, where riders must navigate operational constraints (e.g., delivery deadlines, pickup-delivery precedence), personalized routing preferences, and dynamic demand fluctuations. Existing approaches exhibit critical limitations: combinatorial optimization (CO) often ignores rider preferences; deep learning (DL) struggles to enforce constraints; and deep reinforcement learning (DRL) faces sparse rewards and training instability. To address these challenges, we propose a novel hybrid framework—the double deep Q-learning network for package pickup and delivery route prediction (DDQN-PPDRP)—that integrates double deep Q-networks with constraint-aware optimization. Our method introduces three novel components: (1) an ensemble network combining a traveling salesman model with time windows for constraint satisfaction and an enhanced DeepRoute transformer (i-DeepRoute) with precedence-aware masking to model rider-specific routing behavior; (2) dual-reward functions incorporating complete match rate (CMR) for strict route alignment and edit distance (ED) for incremental feedback, alleviating reward sparsity; and (3) adaptive prioritized experience replay (APER) with double-buffer strategy to stabilize training. Evaluated on real-world data from Dada Express, DDQN-PPDRP achieves significant improvements over state-of-the-art baselines: a 29.14% reduction in location square deviation, 48.33% lower ED, 19.20% higher CMR, and 6.67% improved rank correlation. It also reduces network parameters by 85% compared to pure DL methods and cuts training time by 57.7% through APER. These results demonstrate that DDQN-PPDRP effectively balances operational constraints, rider preferences, and computational efficiency, harmonizing CO’s interpretability with DRL’s adaptability to advance scalable, constraint-compliant route prediction in dynamic logistics environments.
KW - adaptive prioritized experience replay
KW - Double deep Q-learning network
KW - dual-reward function
KW - ensemble learning
KW - pickup and delivery route prediction
UR - http://www.scopus.com/inward/record.url?scp=105008035487&partnerID=8YFLogxK
U2 - 10.1109/TCE.2025.3577902
DO - 10.1109/TCE.2025.3577902
M3 - Article
AN - SCOPUS:105008035487
SN - 0098-3063
JO - IEEE Transactions on Consumer Electronics
JF - IEEE Transactions on Consumer Electronics
ER -